whtlnv
whtlnv

Reputation: 2207

Regex: Capturing groups by newline

I'm stuck with a regex search I'd like to do. Suppose the following list (mind the newlines):

Iowa

Big Store
5 Washington Blvd W.
555-123-456

Market 42
721 23th St. S
555-789-123


New York

Cool Café
23 5th Ave. 
123-456-789


Colorado

Pet Shop
1225 Hot St. N.
654-897-215

Discount Inn
25 Lincoln Rd.
456-987-321

Location 6
Address 6
Telephone 6

So, I figured I would use the \n (newlines) to capture the state first, and then all the following locations with their address and telephone number. This is my last working iteration:

(\n{3}(.*)(?:\n{2}(.*)\n{1}(.*)\n{1}(.*)))

This beauty right there only captures all the states and the FIRST location after each, so I thought 'Adding a + at the end of the non-capturing group should fetch the rest of the locations'. Like this:

(\n{3}(.*)(?:\n{2}(.*)\n{1}(.*)\n{1}(.*))+)

Lies. It didn't. It just breaks.

Am I doing it wrong? How can I make it capture every location between states?

My objective is to gather each group in an array, as in:

locations[0][0][0] -> 'Big Store' 
locations[0][0][1] -> '5 Washington Blvd W.' 
locations[0][0][2] -> '555-123-456' 
...
locations[1][0][0] -> 'Cool Café' 
locations[1][0][1] -> '23 5th Ave.' 
locations[1][0][2] -> '123-456-789' 

Or similar.

Thanks!

Upvotes: 1

Views: 5726

Answers (2)

user140628
user140628

Reputation:

I am not entirely sure what you want to do, but I came up with this in regexpal:

(?:(?:^|\n{3})(.*))(?:(?!\n{3})(?:\n{2})(.*)\n(.*)\n(.*))+

That will match a state with any number of location blocks in between.

Hope that helps, Ben

Update

I have updated the regular expression once more to include & exclude certain groups from capture. Breaking it down, you can see it matches the state at the start of a line or after 3 new lines (?:(?:^|\n{3})(.*)). This is then followed by one or more of 2 new lines (but not 3) followed by 3 address lines.

However, you should note that whilst this regular expression matches the different locations, it only captures the last (in certain implementations). You may have to do some multi-level matching to capture all locations or use @anubhava answer instead.

Upvotes: 1

anubhava
anubhava

Reputation: 784968

This is the regex that should work for you:

([^\n]+)?(?:\n{2}([^\n]+)\n([^\n]+)\n([^\n]+))

Live Demo: http://www.rubular.com/r/GISXu5S2vh

Upvotes: 1

Related Questions