Minifying CSS

Over the years I’ve come up with a number of principles that allow me to produce clean, readable, yet extensible code. I want to share them, but I also want you to see that they work and to understand how they are applied. In the end I want to see them used “in the wild”, and the average code quality to go up if ever so slightly.

So what I will be doing is writing a bunch of code examples, designs and comparisons and later reuse them to demonstrate that certain choices and basic programming principles lead to better code.

Let’s start with writing a way to to compact CSS code. There are a number of existing libraries for this, but we will write one from scratch nevertheless.

All we need is a function to take a string (CSS) and return a string (minified CSS). The most basic minification would be to collapse all whitespace and newlines, so let’s start with that:

import re
def cssmin(css):
    return re.sub("(?m)\s+", ' ', css)

The (?m) thingy sets the re.MULTILINE flag for the expression, so that '\s' matches the newlines:

>>> cssmin('{} \n\n ')
'{} '

Now let’s remove comments as well:

def cssmin(css):
    css = re.sub('(?ms)\/\*.*?\*\/', '', css)
    css = re.sub('(?m)\s+', ' ', css)
    return css

The (?ms) sets the re.MULTILINE and re.DOTALL flags, so that our regexp for comment internals consumes multiline comments:

>>> cssmin('#header \n\n /*comment\n*/ {}')
'#header {}'

We’ve already made most of the progress we can get and additional compaction will not yield significant savings in CSS code size, but let’s add a few more rules.

We can remove whitespace in the beginning and the end of the file and around separators. Remember that we’ve compacted the whitespace and there are at most one character paddings at this point.

def cssmin(css):
    css = re.sub('(?ms)\/\*.*?\*\/', '', css)
    css = re.sub('(?m)\s+', ' ', css)
    css = re.sub(' ?([{};:,+>]) ?', '\\1', css)
    return css.strip()

Note the use of backreference \1 in the replacement string, this way we can remove the padding around all of the delimiters in one sweep.

>>> cssmin(' #content p, body { color: black ; }\n')
'#content p,body{color:black;}'

One thing we need to work around is that by stripping the spaces around the colons we will turn div :hover into div:hover. To avoid that, we’ll remove the colon from the regexp and use the str.replace method. We will do the same to remove that unnecessary semicolon before the closing bracket and we’re almost done.

While we are at it, let’s precompile those rules as well. Here’s the entire implementation:

import re

__all__ = ['cssmin']

_rx_comment = re.compile('(?ms)\/\*.*?\*\/')
_rx_whitespace = re.compile('(?m)\s+')
_rx_padding = re.compile(' ?([{};,+>]) ?')

def cssmin(css):
    css = _rx_comment.sub('', css)
    css = _rx_whitespace.sub(' ', css)
    css = _rx_padding.sub('\\1', css)
    return css.replace(': ', ':').replace(';}', '}').strip()

I wanted to add comments to those rules, but check this out: when we decided to precompile them, we’ve also had to assign the names to them, and those work just as well for documentation purposes, if you ask me.

Now, tell me if this was a waste of time, and we should have just used an existing minifier? Of course there are a couple more things we can collapse, we can even go for a parse-optimize-output approach. But seriously, from a practical standpoint, that’s a waste of time. (Could be useful for self-educational purposes though).

To be fair, this minifier can stumble on some CSS rules for example input[value="a, b"] will turn into input[value="a,b"], but honestly, I don’t really care.

Here are some other implementations for comparison:

Some points to consider:

  • The difference between having a small tidy implementation and a bigger one could be the difference between using 95%-efficient solution and using none at all. The additional complexity of 100% efficiency is sometimes not worth not just the additional 5% but simply not worth it, and it’s better to do without.
  • Small implementatations like the one described here are almost not worth getting a PyPi listing for, the setup.py itself is usually longer than that. There’s also no need for docs (although there should still be some tests). What this causes in general is that all you can find are heavyweight solutions for everything.