Jonathan Itakpe
Jonathan Itakpe

Reputation: 45

Regex: Extract and Match Specific words In between two characters

I need to extract from a string, word that match (way, road, str and street) with every word before and after it up until the comma ',' character or a number in front.

Sample Strings:
1. Yeet Road, Off Mandy Plant Way, Mando GRA.
2. 3A, Sleek Drive, Off Tremble Rake Street.
3. 57 Radish Slist Road Ikoyi

Result should be as close as possible to:

  1. Yeet Road
  2. Mandy Plant Way
  3. Tremble Rake Street
  4. Radish Slist Road Ikoyi

Based on some stack answers, this is what i currently have:
(?<=\,)(.*Way|Road|Str|Street?)(?=\,)

Any help would be appreciated.

Upvotes: 3

Views: 1743

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89547

You can try something like this (with the ignore_case flag):

\b(?:(?!off\b)[a-z]+[^\w,\n]+)*?\b(?:way|road|str(?:eet)?)\b(?:[^\w,\n]+[a-z]+)*

demo

However this kind of patterns, that start to describe an undefined substring of an undefined length before literal parts of the pattern (the keywords), are not efficient. This doesn't matter for small strings, but you can't use them in a large string.

To exclude particular words you can change (?!off\b) to (?!off\b|word1\b|word2\b|...)

Also, you need to be more precise about what characters are allowed or not between words.

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626728

You may consider using

^\d+\s*(*SKIP)(*F)|\b[^,]*\b(?:way|r(?:oa)?d|str(?:eet)?)\b[^,]*\b

See the regex demo

Details:

  • ^\d+\s*(*SKIP)(*F) - matches and omits the initial 1 or more digits and then 0+ whitespaces at the start of the string
  • | - or matches...
  • \b[^,]*\b(?:way|r(?:oa)?d|str(?:eet)?)\b[^,]*\b - any 0+ chars other than comma, then any of the alternatives in the non-capturing group as whole words, and then again 0+ chars other than comma, the whole subpattern is matched within word boundaries to avoid matching leading/trailing punctuation/whitespace.

Upvotes: 1

Related Questions