Lightweight Javascript and CSS compressor / minifier written in PHP

People who know me well are aware that I have an obsession for minimalism and code elegance. It seems like there are only a few Javascript and CSS compressors available on the Web. A handful of code is written in PHP. And even less - "original" code that isn't a port of JSMin, YUI compressor or Dean Edwards' Packer. So I decided to publish my code, which is really a fragment and an integral part of the PHP Fat-Free Framework and it follows the same GPL3 license.

The basic feature that Javascript and CSS compressors have in common is the ability to strip whitespaces and comments off your files, thus reducing the file size and subsequently using less server bandwidth.

So here's the PHP code:-

function minify($_src) {
 // Buffer output
 ob_start();
 $_time=microtime(TRUE);
 $_ptr=0;
 while ($_ptr<=strlen($_src)) {
  if ($_src[$_ptr]=='/') {
   // Let's presume it's a regex pattern
   $_regex=TRUE;
   if ($_ptr>0) {
    // Backtrack and validate
    $_ofs=$_ptr;
    while ($_ofs>0) {
     $_ofs--;
     // Regex pattern should be preceded by parenthesis, colon or assignment operator
     if ($_src[$_ofs]=='(' || $_src[$_ofs]==':' || $_src[$_ofs]=='=') {
       while ($_ptr<=strlen($_src)) {
       $_str=strstr(substr($_src,$_ptr+1),'/',TRUE);
       if (!strlen($_str) && $_src[$_ptr-1]!='/' || strpos($_str,"\n")) {
        // Not a regex pattern
        $_regex=FALSE;
        break;
       }
       echo '/'.$_str;
       $_ptr+=strlen($_str)+1;
       // Continue pattern matching if / is preceded by a \
       if ($_src[$_ptr-1]!='\\' || $_src[$_ptr-2]=='\\') {
         echo '/';
         $_ptr++;
         break;
       }
      }
      break;
     }
     elseif ($_src[$_ofs]!="\t" && $_src[$_ofs]!=' ') {
      // Not a regex pattern
      $_regex=FALSE;
      break;
     }
    }
    if ($_regex && _ofs<1)
     $_regex=FALSE;
   }
   if (!$_regex || $_ptr<1) {
    if (substr($_src,$_ptr+1,2)=='*@') {
     // JS conditional block statement
     $_str=strstr(substr($_src,$_ptr+3),'@*/',TRUE);
     echo '/*@'.$_str.$_src[$_ptr].'@*/';
     $_ptr+=strlen($_str)+6;
    }
    elseif ($_src[$_ptr+1]=='*') {
     // Multiline comment
     $_str=strstr(substr($_src,$_ptr+2),'*/',TRUE);
     $_ptr+=strlen($_str)+4;
    }
    elseif ($_src[$_ptr+1]=='/') {
     // Multiline comment
     $_str=strstr(substr($_src,$_ptr+2),"\n",TRUE);
     $_ptr+=strlen($_str)+2;
    }
    else {
     // Division operator
     echo $_src[$_ptr];
     $_ptr++;
    }
   }
   continue;
  }
  elseif ($_src[$_ptr]=='\'' || $_src[$_ptr]=='"') {
   $_match=$_src[$_ptr];
   // String literal
   while ($_ptr<=strlen($_src)) {
    $_str=strstr(substr($_src,$_ptr+1),$_src[$_ptr],TRUE);
    echo $_match.$_str;
    $_ptr+=strlen($_str)+1;
    if ($_src[$_ptr-1]!='\\' || $_src[$_ptr-2]=='\\') {
     echo $_match;
     $_ptr++;
     break;
    }
   }
   continue;
  }
  if ($_src[$_ptr]!="\r" && $_src[$_ptr]!="\n" && ($_src[$_ptr]!="\t" && $_src[$_ptr]!=' ' ||
   preg_match('/[\w\$]/',$_src[$_ptr-1]) && preg_match('/[\w\$]/',$_src[$_ptr+1])))
    // Ignore whitespaces
    echo str_replace("\t",' ',$_src[$_ptr]);
  $_ptr++;
 }
 echo '/* Compressed in '.round(microtime(TRUE)-$_time,4).' secs */';
 $_out=ob_get_contents();
 ob_end_clean();
 return $_out;
}

The program tries to stay away from expensive PCRE regex calls unless absolutely necessary.There are only a handful of variables also:- 2 pointers ($_ptr and $_ofs), a flag ($_regex) and a temporary string variable ($_str) for lookahead. This way we don't exhaust server RAM, specially with potentially large strings to be manipulated. It's a top down parser, analyzing each character in the string one at a time, and outputting data immediately to the buffer. The short code doesn't attempt to obfuscate Javascript. It doesn't also try to rewrite your code to make it even shorter. The additional 0.5-2% compression achieved by shortening CSS rules like margin:10px 0 10px 0; to margin:10px 0; or abbreviating Javascript variables in your code doesn't justify the additional server load and processing time. I believe level-5 gzip-encoding of an already fat-trimmed file for delivery to a compression-aware Web browser is the more efficient way to go.

If you think this code can help you in your project, feel free to use it. But I would recommend you take an even closer look at the PHP Fat-Free Framework, which this code is part of. The framework offers more features like combining Javascript/CSS files, URL-based caching, CAPTCHA image generation, a template engine, HTML forms processor and a SQL database handler - all in a tiny 40Kb file (uncompressed).

If the GPL3 license is not to your liking because of some of its restrictions, that can be arranged. Just holler.

Combined Pre-order and Post-order Non-recursive DOM Tree Traversal Algorithm

After a lot of googling around, I discovered that a lot of sites discuss pre-order, in-order and post-order algorithms focused on binary trees. Most of the code are familiar textbook materials. Only a few take up more complicated tree structures like DOM documents. Both recursive and iterative approaches to tree traversal have their pros and cons. And I'm not here to overemphasize any of these. Instead we'll take a different approach that's as fast, if not better than any other. The algorithm presented here is written in PHP and allows you to combine both pre-order and post-order traversal sequence in a single pass/loop. We'll be needing a stack that will contain all the nodes we passed along the way as we go through each branch. The traditional method of iterative traversal using a "visited" flag attached to each node has a 1:1 correspondence to the total number of nodes. On the other hand, the use of a stack is proportional to the height (or depth) of th...

UnknownDecember 7, 2010 at 7:15 AM
Excellent work overall!
Found a couple of bugs with this minify process though, illustrated by the following CSS test case:

#boo .bla {
margin:10px -5px 0 10px;
}

When run through the above minifier, two things are wrong:
* "#boo .bla {" becomes "#boo.bla {", which is obviously not the same thing
* "margin:10px -5px 0 10px;" becomes "margin:10px-5px 0 10px;".

Both are invalid, and therefore break layouts.

I have made changes to the above script to fix these issues:

Add another else if block following the "string literal" section:

} else if (($_src[$_ptr] == ' ') &&
(($_src[$_ptr + 1] == '-') || ($_src[$_ptr + 1] == '.') || ($_src[$_ptr + 1] == '#'))) {

// fix "Npx -Npx" cases (space followed by -)
// fix "#boo .bla" cases (space followed by .)
echo $_src[$_ptr];
$_ptr++;
}

This could probably be implemented better, but it fixes the problems listed above.

Cheers,
Danny
AnonymousSeptember 19, 2014 at 9:43 PM
You have posted the blogs are really fantastic and informative.
best air compressor brand

Razor-Sharp Code

Search This Blog

Lightweight Javascript and CSS compressor / minifier written in PHP

Comments

Post a Comment

Popular posts from this blog

Combined Pre-order and Post-order Non-recursive DOM Tree Traversal Algorithm

Combined Pre-Order and Post-Order Traversal: A Stackless Approach