Vladyslav K
Vladyslav K

Reputation: 2606

Regex - find string by excluding part of it

I have text: "Johnny Alan Walker Sint Jansstraat 7, 1012 HG Amsterdam +123456789012"

Is is possible to find Lastname and phone? Exclude address? Address regex is this: "([A-Z]{1,}[a-z]{1,}\s){2}[0-9]{0,4}\,\s{1,}[0-9]{4}\s[A-Z]{2}\s{1,}[a-zA-Z]{1,}" (two words from capital, housenumber, comma, postal code and city)

I want result string to be "Walker +123456789012"

Upvotes: 0

Views: 63

Answers (3)

ʰᵈˑ
ʰᵈˑ

Reputation: 11375

You can use the following to capture just the surname and the phone number.

The first part ((\w+\s){3}) will capture the 3rd occurrence of a word followed by a space.

The second part (.+?) will capture everything

The third part ((\+?\d+)$) will capture an optional + (phone number prefix) and the rest of the phone number, up to the end of the string.

(\w+\s){3}.+?(\+?\d+)$
  • \1 - The surname
  • \2 - The phone number

https://regex101.com/r/gqu0tt/4

But, IF the surname and the address is separated with more than 1 space, then you can use

(\w+)\s{2,}.+?(\+?\d+)$
  • \1 - The surname
  • \2 - The phone number

https://regex101.com/r/gqu0tt/5


I've tested these expressions on the Java engine, and they give back the correct match

Upvotes: 1

A_Elric
A_Elric

Reputation: 3568

You could do....

\w+\s+\w+\s+(\w+).*(\+\d+)

And your capture groups should match up pretty well with what you're trying to match...

Essentially this will "disregard" your first and second "words" (first / middle name) and then disregard EVERYTHING from in between until it finds a + then captures the digits after it.

Live example: https://regex101.com/r/MjJCSv/1

In theory if your last name and your address will always be separated by more than 1 space you can shorten this a little bit and write it as

(\w+)\s{2,}.*(\+\d+)

Live example of this functionality: https://regex101.com/r/vGGB4z/1

Example implementation of the later in java: http://ideone.com/RExAEO

Upvotes: 1

John
John

Reputation: 2425

This should do what you need, and also doesn't assume three names (works without a middle name present), so it's a little more flexible in case you run into entries for people who don't have a middle name:

.*?(\w+)\s*(?:[A-Z]{1,}[a-z]{1,}\s){2}[0-9]{0,4}\,\s{1,}[0-9]{4}\s[A-Z]{2}\s{1,}[a-zA-Z]{1,}\s*(\+\d+)
  • .*?(\w+)\s* - Capture the last word before the whitespace before the address. .*? will lazily match anything up to the word preceeding the address, but not capture. \s* will match the whitespace between the word and the address.
  • (?:[A-Z]{1,}[a-z]{1,}\s){2}[0-9]{0,4}\,\s{1,}[0-9]{4}\s[A-Z]{2}\s{1,}[a-zA-Z]{1,} - your address regex but using a non-capturing group (?:)
  • \s*(\+\d+) - Captures the + and following numbers. \s* will match the whitespace between the address and the +.

I reused your address regex, but made the capture group non-capturing. Then we match the last word before the address (the last name) using (\w+), and the + and following numbers after the address using (\+\d+).

Here it is in action: https://regex101.com/r/YGiaJT/1

Upvotes: 1

Related Questions