tsurantino
tsurantino

Reputation: 1027

(Ruby) parsing a string with RegEx

This is the string that I want to parse: 2 Sep 27 Sep 28 SOME TEXT HERE 35.00

I want to parse it into a list so that the values look like:

list[0] = 'Sep 28'
list[1] = 'SOME TEXT HERE'
list[2] = '35.00'

The RegEx that I've been working on:

^\d{1}\s{1}[a-zA-Z]{3}\s{1}\d{2}\s{1}([a-zA-Z]{3}\s{1}\d{2})\s{1}([a-zA-Z0-9]*\s{1})+(\d+.\d+)

My values are:

list[0] = 'Sep 28'
list[1] = 'HERE'
list[2] = '35.00' 

The list[1] value is off. I'm also probably not parsing the spaces right, but I couldn't find any guidance in the "Pickaxe" book or online.

Upvotes: 0

Views: 470

Answers (2)

nohat
nohat

Reputation: 7291

Your problem is in your second capture group:

([a-zA-Z0-9]*\s{1})+

The parenthesized group is repeated, matching each of the words 'SOME', 'TEXT', and 'HERE' individually, leaving your second capture group with only the final match, 'HERE'.

You need to put the + inside the capturing parenthesized groups, and use non-capturing parentheses (?:...) to enclose your existing group. Non-capturing parentheses, which use (?: to start the group and ) to end the group, are a way in a regular expression to group parts of your match together without capturing the group. You can use repetition operators (+, *, {n}, or {n,m}) on a non-capturing group and then capture the entire expression:

((?:[a-zA-Z0-9]*\s{1})+)

In total:

/^\d{1}\s{1}[a-zA-Z]{3}\s{1}\d{2}\s{1}([a-zA-Z]{3}\s{1}\d{2})\s{1}((?:[a-zA-Z0-9]*\s{1})+)(\d+.\d+)/

As a side note, this is a pretty clunky regex. You never really need to specify {1} in a regex as a single match is the default. Similarly, \d\d is one character less typing than \d{2}. Also, you probably just want \w instead of [a-zA-Z0-9]. Since you don't seem to care about case, you probably just want to use the /i option and simplify the letter character classes. Something like this is a more idiomatic regular expression:

/^\d [a-z]{3} \d\d ([a-z]{3} \d\d) ((?:\w* )+)(\d+.\d+)/i

Finally, though the Ruby documentation for regular expressions is a little thin, Ruby uses somewhat standard Perl-compatible regular expressions, and you can find more information about regular expressions generally at regular-expressions.info

Upvotes: 4

Matt
Matt

Reputation: 381

You may have also been here and tried this tool, but I would highly recommend Rubular. It offers very quick string parsing.

It looks like you already got the specific answer to your question, so I just wanted to drop this in for other people coming by so they can know where to go test their regex or just practice.

Upvotes: 1

Related Questions