Wednesday, February 17, 2010

Lightweight Javascript and CSS compressor / minifier written in PHP

People who know me well are aware that I have an obsession for minimalism and code elegance. It seems like there are only a few Javascript and CSS compressors available on the Web. A handful of code is written in PHP. And even less - "original" code that isn't a port of JSMin, YUI compressor or Dean Edwards' Packer. So I decided to publish my code, which is really a fragment and an integral part of the PHP Fat-Free Framework and it follows the same GPL3 license.

The basic feature that Javascript and CSS compressors have in common is the ability to strip whitespaces and comments off your files, thus reducing the file size and subsequently using less server bandwidth.

So here's the PHP code:-

function minify($_src) {
 // Buffer output
 ob_start();
 $_time=microtime(TRUE);
 $_ptr=0;
 while ($_ptr<=strlen($_src)) {
  if ($_src[$_ptr]=='/') {
   // Let's presume it's a regex pattern
   $_regex=TRUE;
   if ($_ptr>0) {
    // Backtrack and validate
    $_ofs=$_ptr;
    while ($_ofs>0) {
     $_ofs--;
     // Regex pattern should be preceded by parenthesis, colon or assignment operator
     if ($_src[$_ofs]=='(' || $_src[$_ofs]==':' || $_src[$_ofs]=='=') {
       while ($_ptr<=strlen($_src)) {
       $_str=strstr(substr($_src,$_ptr+1),'/',TRUE);
       if (!strlen($_str) && $_src[$_ptr-1]!='/' || strpos($_str,"\n")) {
        // Not a regex pattern
        $_regex=FALSE;
        break;
       }
       echo '/'.$_str;
       $_ptr+=strlen($_str)+1;
       // Continue pattern matching if / is preceded by a \
       if ($_src[$_ptr-1]!='\\' || $_src[$_ptr-2]=='\\') {
         echo '/';
         $_ptr++;
         break;
       }
      }
      break;
     }
     elseif ($_src[$_ofs]!="\t" && $_src[$_ofs]!=' ') {
      // Not a regex pattern
      $_regex=FALSE;
      break;
     }
    }
    if ($_regex && _ofs<1)
     $_regex=FALSE;
   }
   if (!$_regex || $_ptr<1) {
    if (substr($_src,$_ptr+1,2)=='*@') {
     // JS conditional block statement
     $_str=strstr(substr($_src,$_ptr+3),'@*/',TRUE);
     echo '/*@'.$_str.$_src[$_ptr].'@*/';
     $_ptr+=strlen($_str)+6;
    }
    elseif ($_src[$_ptr+1]=='*') {
     // Multiline comment
     $_str=strstr(substr($_src,$_ptr+2),'*/',TRUE);
     $_ptr+=strlen($_str)+4;
    }
    elseif ($_src[$_ptr+1]=='/') {
     // Multiline comment
     $_str=strstr(substr($_src,$_ptr+2),"\n",TRUE);
     $_ptr+=strlen($_str)+2;
    }
    else {
     // Division operator
     echo $_src[$_ptr];
     $_ptr++;
    }
   }
   continue;
  }
  elseif ($_src[$_ptr]=='\'' || $_src[$_ptr]=='"') {
   $_match=$_src[$_ptr];
   // String literal
   while ($_ptr<=strlen($_src)) {
    $_str=strstr(substr($_src,$_ptr+1),$_src[$_ptr],TRUE);
    echo $_match.$_str;
    $_ptr+=strlen($_str)+1;
    if ($_src[$_ptr-1]!='\\' || $_src[$_ptr-2]=='\\') {
     echo $_match;
     $_ptr++;
     break;
    }
   }
   continue;
  }
  if ($_src[$_ptr]!="\r" && $_src[$_ptr]!="\n" && ($_src[$_ptr]!="\t" && $_src[$_ptr]!=' ' ||
   preg_match('/[\w\$]/',$_src[$_ptr-1]) && preg_match('/[\w\$]/',$_src[$_ptr+1])))
    // Ignore whitespaces
    echo str_replace("\t",' ',$_src[$_ptr]);
  $_ptr++;
 }
 echo '/* Compressed in '.round(microtime(TRUE)-$_time,4).' secs */';
 $_out=ob_get_contents();
 ob_end_clean();
 return $_out;
} 

The program tries to stay away from expensive PCRE regex calls unless absolutely necessary.There are only a handful of variables also:- 2 pointers ($_ptr and $_ofs), a flag ($_regex) and a temporary string variable ($_str) for lookahead. This way we don't exhaust server RAM, specially with potentially large strings to be manipulated. It's a top down parser, analyzing each character in the string one at a time, and outputting data immediately to the buffer. The short code doesn't attempt to obfuscate Javascript. It doesn't also try to rewrite your code to make it even shorter. The additional 0.5-2% compression achieved by shortening CSS rules like margin:10px 0 10px 0; to margin:10px 0; or abbreviating Javascript variables in your code doesn't justify the additional server load and processing time. I believe level-5 gzip-encoding of an already fat-trimmed file for delivery to a compression-aware Web browser is the more efficient way to go.

If you think this code can help you in your project, feel free to use it. But I would recommend you take an even closer look at the PHP Fat-Free Framework, which this code is part of. The framework offers more features like combining Javascript/CSS files, URL-based caching, CAPTCHA image generation, a template engine, HTML forms processor and a SQL database handler - all in a tiny 40Kb file (uncompressed).

If the GPL3 license is not to your liking because of some of its restrictions, that can be arranged. Just holler.

Saturday, November 14, 2009

PHP Fat-Free Framework - Less Hype, More Meat

Too often we see code hyped up as "frameworks". But when it comes to applying them to real-world situations, they fall short and sloppy or at the other end of the scale - are huge beasts that behave like control freaks - which make them unusable or hostile to average programmers.

Some are touted as frameworks yet they act simply as front-end controllers which do nothing more than route URLs to classes, functions or include files. They make programming a bit easier, but lacking in many MVC aspects. Some require a fixed directory structure - an open invitation to hackers. Other procedural frameworks use method chaining (which can be quite long), you'd wonder at times what the right sequence should be. Some are just too bloated with so many features than you'll ever need to use in simple blog or wiki applications.

Most frameworks brag about being "lightweight" - which seems to be a relative term. Does a 50MByte framework that consumes a lot of resources qualify as lightweight? Let's call a spade a spade. A cargo truck is lightweight compared to a jumbo jet. Indeed. But if all we need to do to get from here to there on a "regular" basis is a car with some room to spare, why ride the space shuttle? Extra features are more often overkill, like using a jackhammer to drive a half-inch nail.

It seems like in the name of "improving" their software, developers simply forgot or ignored the fact that frameworks are meant to support structures, that is, make applications easier to fabricate and provide order along with industrial strength - not to be imposing structures themselves. This point is argumentative and has been going on since the days of the pyramid builders. Architects and engineers have gone thru endless debates because the first is focused on artistic freedom, the latter on structural integrity.

At the other extreme, the term "lightweight" also means a framework lacking in features that count the most. So it becomes necessary to integrate a lot of third-party modules just to make up for the most important ingredients.

Whichever the case may be, balance and minimalism in framework design - where elegant architectural patterns, engineering excellence and Formula-1 performance are available - is not a philosophy that's commonplace. However, it does manage to find its niche here and there.
This is where the PHP Fat-Free Framework makes its mark. The minimalist framework is so rooted in its Zen world of construction components, that an entire Web application can be developed in so little, yet streamlined, code. That of course means a lot when we're talking programmer productivity and time-to-deploy.

In fact, the entire Fat-Free command set has only 15 static methods and 6 template directives. It weighs in at a compact 32KB of source code, with inline documentation, but make no mistake about its puny size, it's got everything a Web designer needs to get any kind of job done. You won't see the fancy stuff found in large frameworks. It aims to be usable - not usual. It's very much like a modern compact Javascript toolkit for PHP.

Fat-Free gives you a lot of freedom. It won't change your programming style, only your habits - albeit due to the powerful tools you'll have at your disposal. Despite that, Fat-Free is certainly not an end-all, be-all framework. It's not for everyone - only for those who want raw power behind simplicity.

Saturday, October 31, 2009

Combined Pre-Order and Post-Order Traversal: A Stackless Approach

Here's a slight mod to my previous code which gets rid of the stack altogether. It just relies on the parentNode link present in all nodes in a DOM tree. Performance is comparable. Memory usage is less - of course, we don't have a stack anymore.

public function traverse($root) {
// Start at document root
$_flag=FALSE;
$_node=$root->documentElement;
// Repeat until we reach the top
while ($_node) {
if (!$_flag) {
/* Insert pre-order sequence here */

if ($_node->firstChild) {
// Descend to branch
$_flag=FALSE;
$_node=$_node->firstChild;
continue;
}
}
/* Insert post-order sequence here */

// Reset flag
$_flag=FALSE;
if ($_node->nextSibling)
$_node=$_node->nextSibling;
else {
// Ascend to parent node
$_flag=TRUE;
$_node=$_node->parentNode;
}
}
}

Like it's predecessor, it doesn't need a "visited" flag attached to its nodes. The $_flag variable here is key to how it works. It is a simple toggle to indicate whether we're moving down/right along a branch (FALSE) or ascending the tree (TRUE).

Now we have a state machine without the extra baggage.

Combined Pre-order and Post-order Non-recursive DOM Tree Traversal Algorithm

After a lot of googling around, I discovered that a lot of sites discuss pre-order, in-order and post-order algorithms focused on binary trees. Most of the code are familiar textbook materials. Only a few take up more complicated tree structures like DOM documents.

Both recursive and iterative approaches to tree traversal have their pros and cons. And I'm not here to overemphasize any of these. Instead we'll take a different approach that's as fast, if not better than any other. The algorithm presented here is written in PHP and allows you to combine both pre-order and post-order traversal sequence in a single pass/loop.

We'll be needing a stack that will contain all the nodes we passed along the way as we go through each branch. The traditional method of iterative traversal using a "visited" flag attached to each node has a 1:1 correspondence to the total number of nodes. On the other hand, the use of a stack is proportional to the height (or depth) of the tallest branch.

To determine how we handle each node in the tree, we need some indicator/flag to tell us whether we're traversing downward/right or ascending from the end of a branch (a leaf).

Without much ado:-

public function traverse($tree) {
$_stack=array();
$_flag=FALSE;
$_node=$tree->documentElement;
while ($_node) {
if (!$_flag) {
// insert pre-order code here
if ($_node->firstChild) {
array_push($_stack,$_node);
$_flag=FALSE;
$_node=$_node->firstChild;
continue;
}
}
// insert post-order code here
$_flag=FALSE;
$_node=$_node->nextSibling;
if (!$_node && !empty($_stack)) {
$_node=array_pop($_stack);
$_flag=TRUE;
}
}
}

The traversal strategy is no different from other algorithms, except the way we use our stack and the $_flag indicator (which is really a state indicator and vector).

  1. Initialize stack. Start with a descending vector because we're not traversing the tree at this point from bottom-up.
  2. If not ascending from a child node, save each node in the stack as we pass along the way to the deepest part of the left-most branch.
  3. Once we reach a leaf, we go to the leaf's sibling and repeat step 2.
  4. If there are no more siblings, we pop the stack (effectively going up to the parent node) and repeat step 2.

Where indicated in the code snippet, you can insert both pre-order and post-order processing routines in the loop. This is more efficient than having to traverse a tree twice to do both. But first you need a situation where you'll be needing both. Now we can talk about that at a later time.

You might also want to take a look at the stackless approach here.