(Ruby) parsing a string with RegEx

Question

This is the string that I want to parse: 2 Sep 27 Sep 28 SOME TEXT HERE 35.00

I want to parse it into a list so that the values look like:

list[0] = 'Sep 28'
list[1] = 'SOME TEXT HERE'
list[2] = '35.00'

The RegEx that I've been working on:

^\d{1}\s{1}[a-zA-Z]{3}\s{1}\d{2}\s{1}([a-zA-Z]{3}\s{1}\d{2})\s{1}([a-zA-Z0-9]*\s{1})+(\d+.\d+)

My values are:

list[0] = 'Sep 28'
list[1] = 'HERE'
list[2] = '35.00'

The list[1] value is off. I'm also probably not parsing the spaces right, but I couldn't find any guidance in the "Pickaxe" book or online.

nohat · Accepted Answer

Your problem is in your second capture group:

([a-zA-Z0-9]*\s{1})+

The parenthesized group is repeated, matching each of the words 'SOME', 'TEXT', and 'HERE' individually, leaving your second capture group with only the final match, 'HERE'.

You need to put the + inside the capturing parenthesized groups, and use non-capturing parentheses (?:...) to enclose your existing group. Non-capturing parentheses, which use (?: to start the group and ) to end the group, are a way in a regular expression to group parts of your match together without capturing the group. You can use repetition operators (+, *, {n}, or {n,m}) on a non-capturing group and then capture the entire expression:

((?:[a-zA-Z0-9]*\s{1})+)

In total:

/^\d{1}\s{1}[a-zA-Z]{3}\s{1}\d{2}\s{1}([a-zA-Z]{3}\s{1}\d{2})\s{1}((?:[a-zA-Z0-9]*\s{1})+)(\d+.\d+)/

As a side note, this is a pretty clunky regex. You never really need to specify {1} in a regex as a single match is the default. Similarly, \d\d is one character less typing than \d{2}. Also, you probably just want \w instead of [a-zA-Z0-9]. Since you don't seem to care about case, you probably just want to use the /i option and simplify the letter character classes. Something like this is a more idiomatic regular expression:

/^\d [a-z]{3} \d\d ([a-z]{3} \d\d) ((?:\w* )+)(\d+.\d+)/i

Finally, though the Ruby documentation for regular expressions is a little thin, Ruby uses somewhat standard Perl-compatible regular expressions, and you can find more information about regular expressions generally at regular-expressions.info

(Ruby) parsing a string with RegEx

Answers (2)

Related Questions