Reputation: 37
My string pattern is as follows:
1233 fox street, omaha NE ,69131-7233
Jeffrey Jones, 666 Church Street, Omaha NE ,69131-72339
Betty Davis, LLC, 334 Aloha Blvd., Fort Collins CO ,84444-00333
,1233 Decker street, omaha NE ,69131-7233
I need to separate the above string into four variables: name, address, city_state, zipcode.
Since the pattern has three to four commas, I am starting at the right to separate the field into multiple fields.
rubular.com says the pattern ("(,\\d.........)$")))
or the pattern ",\d.........$"
will match the zipcode at the end of the string.
regex101.com, finds neither of the above patterns comes up with a match.
When I try to separate with:
#need to load pkg:tidyr for the `separate`
function library(tidyr) separate(street_add, c("street_add2", "zip", sep= ("(,\d.........)$")))
or with:
separate(street_add, c("street_add2", "zip", sep= (",\d.........$")))
In both scenarios, R splits at the first comma in the string.
How do I split the string into segments?
Thank you.
Upvotes: 1
Views: 291
Reputation: 18641
Use
sep=",(?=[^,]*$)"
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^,]* any character except: ',' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
$ before an optional \n, and the end of
the string
--------------------------------------------------------------------------------
) end of look-ahead
Upvotes: 2