Reputation: 2207
I'm stuck with a regex search I'd like to do. Suppose the following list (mind the newlines):
Iowa
Big Store
5 Washington Blvd W.
555-123-456
Market 42
721 23th St. S
555-789-123
New York
Cool Café
23 5th Ave.
123-456-789
Colorado
Pet Shop
1225 Hot St. N.
654-897-215
Discount Inn
25 Lincoln Rd.
456-987-321
Location 6
Address 6
Telephone 6
So, I figured I would use the \n (newlines) to capture the state first, and then all the following locations with their address and telephone number. This is my last working iteration:
(\n{3}(.*)(?:\n{2}(.*)\n{1}(.*)\n{1}(.*)))
This beauty right there only captures all the states and the FIRST location after each, so I thought 'Adding a +
at the end of the non-capturing group should fetch the rest of the locations'. Like this:
(\n{3}(.*)(?:\n{2}(.*)\n{1}(.*)\n{1}(.*))+)
Lies. It didn't. It just breaks.
Am I doing it wrong? How can I make it capture every location between states?
My objective is to gather each group in an array, as in:
locations[0][0][0] -> 'Big Store'
locations[0][0][1] -> '5 Washington Blvd W.'
locations[0][0][2] -> '555-123-456'
...
locations[1][0][0] -> 'Cool Café'
locations[1][0][1] -> '23 5th Ave.'
locations[1][0][2] -> '123-456-789'
Or similar.
Thanks!
Upvotes: 1
Views: 5726
Reputation:
I am not entirely sure what you want to do, but I came up with this in regexpal:
(?:(?:^|\n{3})(.*))(?:(?!\n{3})(?:\n{2})(.*)\n(.*)\n(.*))+
That will match a state with any number of location blocks in between.
Hope that helps, Ben
Update
I have updated the regular expression once more to include & exclude certain groups from capture. Breaking it down, you can see it matches the state at the start of a line or after 3 new lines (?:(?:^|\n{3})(.*))
. This is then followed by one or more of 2 new lines (but not 3) followed by 3 address lines.
However, you should note that whilst this regular expression matches the different locations, it only captures the last (in certain implementations). You may have to do some multi-level matching to capture all locations or use @anubhava answer instead.
Upvotes: 1
Reputation: 784968
This is the regex that should work for you:
([^\n]+)?(?:\n{2}([^\n]+)\n([^\n]+)\n([^\n]+))
Upvotes: 1