Reputation: 2606
I have text: "Johnny Alan Walker Sint Jansstraat 7, 1012 HG Amsterdam +123456789012"
Is is possible to find Lastname and phone?
Exclude address?
Address regex is this: "([A-Z]{1,}[a-z]{1,}\s){2}[0-9]{0,4}\,\s{1,}[0-9]{4}\s[A-Z]{2}\s{1,}[a-zA-Z]{1,}"
(two words from capital, housenumber, comma, postal code and city)
I want result string to be "Walker +123456789012"
Upvotes: 0
Views: 63
Reputation: 11375
You can use the following to capture just the surname and the phone number.
The first part ((\w+\s){3}
) will capture the 3rd occurrence of a word followed by a space.
The second part (.+?
) will capture everything
The third part ((\+?\d+)$
) will capture an optional +
(phone number prefix) and the rest of the phone number, up to the end of the string.
(\w+\s){3}.+?(\+?\d+)$
\1
- The surname\2
- The phone numberhttps://regex101.com/r/gqu0tt/4
But, IF the surname and the address is separated with more than 1 space, then you can use
(\w+)\s{2,}.+?(\+?\d+)$
\1
- The surname\2
- The phone numberhttps://regex101.com/r/gqu0tt/5
I've tested these expressions on the Java engine, and they give back the correct match
Upvotes: 1
Reputation: 3568
You could do....
\w+\s+\w+\s+(\w+).*(\+\d+)
And your capture groups should match up pretty well with what you're trying to match...
Essentially this will "disregard" your first and second "words" (first / middle name) and then disregard EVERYTHING from in between until it finds a + then captures the digits after it.
Live example: https://regex101.com/r/MjJCSv/1
In theory if your last name and your address will always be separated by more than 1 space you can shorten this a little bit and write it as
(\w+)\s{2,}.*(\+\d+)
Live example of this functionality: https://regex101.com/r/vGGB4z/1
Example implementation of the later in java: http://ideone.com/RExAEO
Upvotes: 1
Reputation: 2425
This should do what you need, and also doesn't assume three names (works without a middle name present), so it's a little more flexible in case you run into entries for people who don't have a middle name:
.*?(\w+)\s*(?:[A-Z]{1,}[a-z]{1,}\s){2}[0-9]{0,4}\,\s{1,}[0-9]{4}\s[A-Z]{2}\s{1,}[a-zA-Z]{1,}\s*(\+\d+)
.*?(\w+)\s*
- Capture the last word before the whitespace before the address. .*?
will lazily match anything up to the word preceeding the address, but not capture. \s*
will match the whitespace between the word and the address.(?:[A-Z]{1,}[a-z]{1,}\s){2}[0-9]{0,4}\,\s{1,}[0-9]{4}\s[A-Z]{2}\s{1,}[a-zA-Z]{1,}
- your address regex but using a non-capturing group (?:
)\s*(\+\d+)
- Captures the +
and following numbers. \s*
will match the whitespace between the address and the +
.I reused your address regex, but made the capture group non-capturing. Then we match the last word before the address (the last name) using (\w+)
, and the +
and following numbers after the address using (\+\d+)
.
Here it is in action: https://regex101.com/r/YGiaJT/1
Upvotes: 1