lyncd

Gzip compression levels for static-cached HTML

I was reading through the source of WP Super Cache recently, and noticed that it was using a gzip compression level of “1” (the lowest) to compress its static-cached HTML pages. Level 1? Why not 3 or 6 (the default) or 9?

These pages are compressed and saved once on the server, and then sent many times to user’s browsers. So, what compression level makes the most sense for pre-compressed HTML?

Full disclosure: This whole thing started because I saw that Super Cache is using Level 1 and thought: “Level 1? That’s terrible, no one uses level 1. Everyone knows levels 2-4 are virtually as fast and compress several percentage points better!”

But then I told myself: “Sticks, you are a dumbass, so you are probably wrong. And now you’re even talking to yourself in your avatar name! Better put away the LSD and test it out!”

Whether you use Super Cache or the ugly code hairball that is WordPress doesn’t really matter.

Hopefully you’re familiar with how zlib works, and why you’d want to pre-compress HTML and page components (and not just do it on-the-fly with mod_gzip or mod_deflate). The use case here is also a little special because we’re not talking about gzipping the transport stream on every request (mod_deflate) or truly static components (do “gzip -9,” duh), but HTML pages that are cached on the server for an indeterminate number of requests. In addition, the cache can be invalidated at any time by various actions (i.e. user edits page, visitor posts comment, admin feels like it) regardless of time expiry or number of requests.

Anyway, here’s the data for a sample 5,201-byte page. Using gzip level 1 it compresses to 2,049 bytes, or 61%. But what we’re concerned with here is the marginal improvement and CPU cost of increasing the gzip level:

Marginal CPU cost and compression gain by zlib compression level, with zlib level 1 as baseline
Levelδ msδ file reduced
100.0%
20.051.8%
30.072.1%
40.194.9%
50.235.7%
60.235.9%
70.246.0%
80.286.0%
90.296.0%

The test machine is a medium-slow single-CPU desktop box, running idle.

The big takeaway here for the WordPress use case is that the δ millisecond times are all way too small to matter — 1/3 of a millisecond is not significant when WordPress takes 500 ms just to initialize on the same machine!

But a 6% reduction in HTML bandwidth, that is a significant savings.

Conclusion

I’m going to stick with level 6, gzip’s default, in semi-static gzipping applications of this kind. The additional space savings of going to level 9 (0.1% in the table above) equates to only a handful of bytes for a 5K page.

If you’re a WP Super Cache user and want to change the compression level, here’s a patch, or you can edit wp-super-cache/wp-cache-phase2.php yourself:

--- wp-cache-phase2.php-old  2008-11-01 15:26:09.000000000 -0700
+++ wp-cache-phase2.php-new  2008-11-01 15:25:04.000000000 -0700
@@ -236,12 +236,12 @@
    if( $fr2 )
      fputs($fr2, $store . '<!-- super cache -->' );
    if( $gz )
-     fputs($gz, gzencode( $store . '<!-- super cache gz -->', 1, FORCE_GZIP ) );
+     fputs($gz, gzencode( $store . '<!-- super cache gz -->'));
  } else {
    $log = "\n";

    if( $gz || $wp_cache_gzip_encoding ) {
-     $gzdata = gzencode( $buffer . $log . "<!-- Compression = gzip -->", 1, FORCE_GZIP );
+     $gzdata = gzencode( $buffer . $log . "<!-- Compression = gzip -->");
      $gzsize = strlen($gzdata);
    }

(If you don’t want to change to 6, just change the 1 to some other number. And the FORCE_GZIP parameter is both optional and the default, so just leave it off.)

Filed under: Code.  Tagged: , , .