Reputation: 45
I need to extract from a string, word that match (way, road, str and street) with every word before and after it up until the comma ',' character or a number in front.
Sample Strings:
1. Yeet Road, Off Mandy Plant Way, Mando GRA.
2. 3A, Sleek Drive, Off Tremble Rake Street.
3. 57 Radish Slist Road Ikoyi
Result should be as close as possible to:
Based on some stack answers, this is what i currently have:
(?<=\,)(.*Way|Road|Str|Street?)(?=\,)
Any help would be appreciated.
Upvotes: 3
Views: 1743
Reputation: 89547
You can try something like this (with the ignore_case flag):
\b(?:(?!off\b)[a-z]+[^\w,\n]+)*?\b(?:way|road|str(?:eet)?)\b(?:[^\w,\n]+[a-z]+)*
However this kind of patterns, that start to describe an undefined substring of an undefined length before literal parts of the pattern (the keywords), are not efficient. This doesn't matter for small strings, but you can't use them in a large string.
To exclude particular words you can change (?!off\b)
to (?!off\b|word1\b|word2\b|...)
Also, you need to be more precise about what characters are allowed or not between words.
Upvotes: 2
Reputation: 626728
You may consider using
^\d+\s*(*SKIP)(*F)|\b[^,]*\b(?:way|r(?:oa)?d|str(?:eet)?)\b[^,]*\b
See the regex demo
Details:
^\d+\s*(*SKIP)(*F)
- matches and omits the initial 1 or more digits and then 0+ whitespaces at the start of the string|
- or matches...\b[^,]*\b(?:way|r(?:oa)?d|str(?:eet)?)\b[^,]*\b
- any 0+ chars other than comma, then any of the alternatives in the non-capturing group as whole words, and then again 0+ chars other than comma, the whole subpattern is matched within word boundaries to avoid matching leading/trailing punctuation/whitespace.Upvotes: 1