user990016
user990016

Reputation: 3378

How to split a string with a Regex when the pattern "prefix" is variable

I have the following string:

Giants 2 9 : 10 L.Tynes 22 yd . Field Goal ( 4 - - 3 , 1 : 20 ) 0 3 Cowboys 2 1 : 01 K.Ogletree 10 yd . pass from T.Romo ( D.Bailey kick ) ( 7 - 73 , 2 : 33 ) 7 3 Cowboys 3 10 : 24 K.Ogletree 40 yd . pass from T.Romo ( D.Bailey kick ) ( 9 - 80 , 4 : 36 ) 14 3 Giants 3 5 : 11 A.Bradshaw 10 yd . run ( L.Tynes kick ) ( 9 - 89 , 5 : 13 ) 14 10 Cowboys 3 0 : 40 D.Bailey 33 yd . Field Goal ( 8 - 65 , 4 : 31 ) 17 10 Cowboys 4 5 : 57 M.Austin 34 yd . pass from T.Romo ( D.Bailey kick ) ( 8 - 82 , 7 : 06 ) 24 10 Giants 4 2 : 36 M.Bennett 9 yd . pass from E.Manning ( L.Tynes kick ) ( 12 - 79 , 3 : 21 ) 24 17 Time : 2 : 53

The prefix to the subtrings will either be "Cowboys" or "Giants". The string always ends with a right parenthesis ) and two numbers.

I can't even imagine what Regex to use. I can use string functions and loop over the string, but a Regex would help me later on. Maybe I could use the split function, but that's over my head.

I suppose I could parse "Cowboys" then "Giants".

Upvotes: 0

Views: 384

Answers (2)

sapht
sapht

Reputation: 2829

I don't know ColdFusion, but this does the job in python:

match = re.findall(re.compile('((Giants|Cowboys)(.(?!Cowboys|Giants))*.)', re.DOTALL), s)

where s is the provided string. re.DOTALL implies that . matches whitespace. re.findall means to do a global search, which reFindAll probably does as well.

The regex does this:

  • Create a spanning group
  • Look for "Giants" or "Cowboys" as the starting string
  • Look for any character (.) that's not followed by the string "Cowboys" or "Giants" and matches as many as possible (which means, match all characters until the one succeeded by "Cowboys" or "Giants".
  • Match another character.

Since there's three groups, the group you're interested in might be numbered differently in ColdFusion. In python, they're embedded in the parent group.

>>> match[0]
('Giants 2 9 : 10 L.Tynes 22 yd . Field Goal ( 4 - - 3 , 1 : 20 ) 0 3', 'Giants', '3')
>>> match[1]
('Cowboys 2 1 : 01 K.Ogletree 10 yd . pass from T.Romo ( D.Bailey kick ) ( 7 - 73 , 2 : 33 ) 7 3', 'Cowboys', '3')
>>> match[2]
('Cowboys 3 10 : 24 K.Ogletree 40 yd . pass from T.Romo ( D.Bailey kick ) ( 9 - 80 , 4 : 36 ) 14 3', 'Cowboys', '3')

I think in most other languages you would address match[1], match[4], match[7], ... instead.

Upvotes: 0

Tim Goodman
Tim Goodman

Reputation: 23976

I think this RegEx gives what you want:

(Cowboys|Giants).*?\)\s\d+\s\d+

"Cowboys" or "Giants" followed by arbitrary characters until you get a right paren, a space, some digits, a space, and some more digits.

Upvotes: 1

Related Questions