Reputation: 35
I'm having difficulty with a Python regex. I want to fine any of N, S, E, W, NB, SB, EB, WB, including at the start or end of the string. My regex easily finds this in the middle, but fails on the start or end.
Can anyone advise what I'm doing wrong with dirPattern i below code sample?
Note: I realize I have some other problems to deal with (e.g. 'W of'), but think I know how to modify the regex for those.
Thanks in advance.
import re
nameList = ['Boulder Highway and US 95 NB', 'Boulder Hwy and US 95 SB',
'Buffalo and Summerlin N', 'Charleston and I-215 W', 'Eastern and I-215 S', 'Flamingo and NB I-15',
'S Buffalo and Summerlin', 'Flamingo and SB I-15', 'Gibson and I-215 EB', 'I-15 at 3.5 miles N of Jean',
'I-15 NB S I-215 (dual)', 'I-15 SB 4.3 mile N of Primm', 'I-15 SB S of Russell', 'I-515 SB at Eastern W',
'I-580 at I-80 N E', 'I-580 at I-80 S W', 'I-80 at E 4TH St Kietzke Ln', 'I-80 East of W McCarran',
'LV Blvd at I-215 S', 'S Buffalo and I-215 W', 'S Decatur and I-215 WB', 'Sahara and I-15 East',
'Sands and Wynn South Gate', 'Silverado Ranch and I-15 (west side)']
dirMap = {'N': 'North', 'S': 'South', 'E': 'East', 'W': 'West'}
dirPattern = re.compile(r'[ ^]([NSEW])B?[ $]')
print('name\tmatch\tdirSting\tdirection')
for name in nameList:
match = dirPattern.search(name)
direction = None
dirString = None
if match:
dirString = match.group(1)
if dirString in dirMap:
direction = dirMap[dirString]
print('%s\t%s\t%s\t%s'%(name, match, dirString, direction))
Some sample expected output:
name match dirSting direction
Boulder Highway and US 95 NB <_sre.SRE_Match object at 0x7f68af836648> N North
Boulder Hwy and US 95 SB <_sre.SRE_Match object at 0x7f68ae836648> S South
Buffalo and Summerlin N <_sre.SRE_Match object at 0x7f68af826648> N North
Charleston and I-215 W <_sre.SRE_Match object at 0x7f68cf836648> W West
Flamingo and NB I-15 <_sre.SRE_Match object at 0x7f68af8365d0> N North
S Buffalo and Summerlin <_sre.SRE_Match object at 0x7f68aff36648> S South
Gibson and I-215 EB <_sre.SRE_Match object at 0x7f68afa36648> E East
However, start or end examples give:
Boulder Highway and US 95 NB None None None
Upvotes: 1
Views: 174
Reputation: 35
The modified regex in this code does the trick. This includes handling things like 'W of', 'at E', and similar:
import re
nameList = ['Boulder Highway and US 95 NB', 'Boulder Hwy and US 95 SB',
'Buffalo and Summerlin N', 'Charleston and I-215 W', 'Eastern and I-215 S', 'Flamingo and NB I-15',
'S Buffalo and Summerlin', 'Flamingo and SB I-15', 'Gibson and I-215 EB', 'I-15 at 3.5 miles N of Jean',
'I-15 NB S I-215 (dual)', 'I-15 SB 4.3 mile N of Primm', 'I-15 SB S of Russell', 'I-515 SB at Eastern W',
'I-580 at I-80 N E', 'I-580 at I-80 S W', 'I-80 at E 4TH St Kietzke Ln', 'I-80 East of W McCarran',
'LV Blvd at I-215 S', 'S Buffalo and I-215 W', 'S Decatur and I-215 WB', 'Sahara and I-15 East',
'Sands and Wynn South Gate', 'Silverado Ranch and I-15 (west side)']
dirMap = {'N': 'North', 'S': 'South', 'E': 'East', 'W': 'West'}
dirPattern = re.compile(r'(?:^| )(?<! at )(?<! of )([NSEW])B?(?! of )(?: |$)')
print('name\tdirSting\tdirection')
for name in nameList:
match = dirPattern.search(name)
direction = None
dirString = None
if match:
dirString = match.group(1)
direction = dirMap.get(dirString)
print('> %s\t\t%s\t%s'%(name, dirString, direction))
The regex can be understood as follows:
(?:^| )
start with either beginning of string or a space
(?<! at )
not preceded by ' at '
(?<! of )
not preceded by ' of '
([NSEW])
Any one of 'N', 'S', 'E', 'W' (this will be in match.group(1))
B?
Optionally followed by 'B' (as in bound)
(?! of )
not followed by ' at '
(?: |$)
end with either end of string or a space
Final output is:
Boulder Highway and US 95 NB N North
Boulder Hwy and US 95 SB S South
Buffalo and Summerlin N N North
Charleston and I-215 W W West
Eastern and I-215 S S South
Flamingo and NB I-15 N North
S Buffalo and Summerlin S South
Flamingo and SB I-15 S South
Gibson and I-215 EB E East
I-15 at 3.5 miles N of Jean None None
I-15 NB S I-215 (dual) N North
I-15 SB 4.3 mile N of Primm S South
I-15 SB S of Russell S South
I-515 SB at Eastern W S South
I-580 at I-80 N E N North
I-580 at I-80 S W S South
I-80 at E 4TH St Kietzke Ln None None
I-80 East of W McCarran None None
LV Blvd at I-215 S S South
S Buffalo and I-215 W S South
S Decatur and I-215 WB S South
Sahara and I-15 East None None
Sands and Wynn South Gate None None
Silverado Ranch and I-15 (west side) None None
Side note: I decided I don't want the end string case. For this, the regex would be:
dirPattern = re.compile(r'(?:^| )(?<! at )(?<! of )([NSEW])B? (?!of )')
Upvotes: 0
Reputation: 174696
You need to use lookarounds.
dirPattern = re.compile(r'(?<!\S)([NSEW])B?(?!\S)')
[ ^]
would match a space or caret symbol. (?<!\S)
negative lookbehind asserts that the match would be preceded by any bot not a non-space character. (?!\S)
asserts that he match must not be followed by a non-space character.
Why I used negative lookahead instead of positive means, python's default re
module won't support (?<=^| )
.
Upvotes: 1