Need regex help!

tliebeck's picture

I'm trying to create a regular expression to test (and perhaps later parse) CSS borders, i.e., things like "1px solid #abcdef".

Here's what I have so far...not quite working as I'd like. Any help appreciated!

<!DOCTYPE html 
     PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
     
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
 <head>
  <title>DevScratch</title>
  <script type="text/javascript">
    function init() {
        // This is what I thought it should be, but doesn't work:
        var pattern = /^(-?\d+px *)?\b(none|hidden|dotted|dashed|solid|double|groove|ridge|inset|outset)?\b(#[0-9a-fA-F]{6})?$/
        
        // Bset I can do at the moment...not quite what I want:
        var pattern = /^(-?\d+px *)? ?(none|hidden|dotted|dashed|solid|double|groove|ridge|inset|outset)? ?(#[0-9a-fA-F]{6})?$/
        
        write("Should be true:");
        write(pattern.test("1px solid #abcdef"));
        write(pattern.test("solid #abcdef"));
        write(pattern.test("1px solid"));
        write(pattern.test("1px #abcdef"));
        write(pattern.test("1px"));
        write(pattern.test("#abcdef"));
        write(pattern.test("solid"));
        write("Should be false:");
        
        write(pattern.test("1px solid#abcdef"));
        write(pattern.test("1pxsolid #abcdef"));
        write(pattern.test("1pxsolid#abcdef"));
        write(pattern.test("solid#abcdef"));
        write(pattern.test("1pxsolid"));
        write(pattern.test("1px#abcdef"));
        
        write(pattern.test("x1px solid #abcdef"));
    }
    
    function write(text) {
        var div = document.createElement("div");
        div.appendChild(document.createTextNode(text));
        document.body.appendChild(div);
    }
    
  </script>
 </head>
 <body onload="init();" style="margin: 0px; padding: 0px;overflow:auto;font-size:10pt;font-family:sans-serif;">
 </body>
</html>

mjablonski's picture

Hi Tod,

how about to simplify the problem? First a split along whitespace, then comparision of simple expressions.

Please note: something which is still untested is the order of the elements in the expression. Don't know if this needs to be done, but it should be possible to check this too.

Cheers, Maik

    function test(expression) {
        var pattern = /^(-?\d+px)$|^(none|hidden|dotted|dashed|solid|double|groove|ridge|inset|outset)$|^(#[0-9a-fA-F]{6})$/
        var items = expression.split(" ");        
        for(var i=0;i<items.length;i++) {           
            if(pattern.test(items[i])==false) {
               return false;
            }
        }
        return true;
    }

    function init() {        
        write("Should be true:");
        write(test("1px solid #abcdef"));
        write(test("solid #abcdef"));
        write(test("1px solid"));
        write(test("1px #abcdef"));
        write(test("1px"));
        write(test("#abcdef"));
        write(test("solid"));
        write("Should be false:");
        
        write(test("1px solid#abcdef"));
        write(test("1pxsolid #abcdef"));
        write(test("1pxsolid#abcdef"));
        write(test("solid#abcdef"));
        write(test("1pxsolid"));
        write(test("1px#abcdef"));
        write(test("x1px solid #abcdef"));
    }

tliebeck's picture

Not a bad idea, but I'd really like to get it all done in one regex for performance reasons. I want something I can use to validate a CSS border as being valid (and in pixels) and dump it directly from the component to the DOM if this is the case. Regular expressions are fast, and if I can answer the question with just one call into native code it may yield a significant performance improvement.

tliebeck's picture

Have figured out its the sharp (#) that's breaking it, it's not a word character, thus word boundary "\b" doesn't work...not yet sure what to do.

tliebeck's picture

Woohoo...got one!

/^(-?\d+px)?(?:^|$|(?= )) ?(none|hidden|dotted|dashed|solid|double|groove|ridge|inset|outset)?(?:^|$|(?= )) ?(#[0-9a-fA-F]{6})?$/

This is pixels only at the moment...which is actually what I want...wouldn't be hard to modify to work with other units for verification though. Indistinguishable from line noise indeed....I love it!

Edit: here's a more generic one that handles non-pixels.... for general purpose use you'd also want to add support for 12-bit colors and named colors though:

/^(-?\d+(?:px|pt|pc|cm|mm|in|em|ex|%))?(?:^|$|(?= )) ?(none|hidden|dotted|dashed|solid|double|groove|ridge|inset|outset)?(?:^|$|(?= )) ?(#[0-9a-fA-F]{6})?$/

Edit^2: Updated expressions for parsing use.

Edit^3: Frickin bbcode smilies.

mjablonski's picture

Tod Liebeck wrote:

/^(-?\d+(?:Px|pt|pc|cm|mm|in|em|ex|%))?(?:^|$|(?= )) ?(none|hidden|dotted|dashed|solid|double|groove|ridge|inset|outset)?(?:^|$|(?= )) ?(#[0-9a-fA-F]{6})?$/

Great job... never thought that this would be possible with regexps...:-)

Cheers, Maik

tliebeck's picture

mjablonski wrote:

Great job... never thought that this would be possible with regexps...:-)

Cheers, Maik

Thanks, I'm actually fairly new to regular expressions, but am really starting to appreciate them. They're quite fast (given that you can have native code do all the work) and pretty darn powerful.

This returned 1000ms, i.e., one second give or take on FF2 and IE6(wine) on a 2.4GHz p4 ubuntu box.... that's 160,000 regular expression tests per second:

var pattern = /^(-?\d+(?:px|pt|pc|cm|mm|in|em|ex|%))?(?:^|$|(?= )) ?(none|hidden|dotted|dashed|solid|double|groove|ridge|inset|outset)?(?:^|$|(?= )) ?(#[0-9a-fA-F]{6})?$/
var startTime = new Date().getTime();
for (var i = 0; i < 10000; ++i) {
        pattern.test("1px outset #6f6f8f");
        pattern.test("1px solid #abcdef");
        pattern.test("1px solid #abcdef");
        pattern.test("solid #abcdef");
        pattern.test("1px solid");
        pattern.test("1px #abcdef");
        pattern.test("1px");
        pattern.test("#abcdef");
        pattern.test("solid");
        pattern.test("1px solid #abcdef");
        pattern.test("solid #abcdef");
        pattern.test("1pt solid");
        pattern.test("1pc #abcdef");
        pattern.test("1em");
        pattern.test("#abcdef");
        pattern.test("solid");
}        
var endTime = new Date().getTime();
alert(endTime - startTime);

Yes, I figure it won't be long before echo3 consists only of regexes :) What would be cool is if you could do a quick comparison when achieving the same functionality using simple indexOf() calls. I'm really curious to see those figures.