Reputation: 674
I have the following series:
import pandas as pd
lst = pd.Series(['57 Freeport Crescent NE', '890 4 Avenue SW'])
And I have the following dictionary in which I joined together the keys and values since I am interested in searching all of them through my list
direction = {
'^Northwest$': '$NW^',
'^Northeast$': '$NE^',
'^Southeast$': '$SE^',
'^Southwest$': '$SW^',
'^North$': '$N^',
'^East$': '$E^',
"^South$": '$S^',
"^West$": "$W^"}
all_direction = direction.keys() | direction.values()
all_direction = '|'.join(all_direction)
My question is why doesn't lst.str.contains(all_direction, case = False)
return both False, but instead of returning both True since they contain NE and SW?
Upvotes: 0
Views: 1163
Reputation: 34
I think it is because pd.Series(['57 Freeport Crescent NE', '890 4 Avenue SW'])
results in:
output:
0 57 Freeport Crescent NE
1 890 4 Avenue SW
dtype: object
I don't have much experience using this but it seems that you can use this as a list. SW and NE will not be in that list because in each index there is a full string instead of separate. I don't know if this answers your question though...
Upvotes: 1
Reputation: 124
Instead of taking the whole regex pattern, I've taken the minimum which can match. Here, when try searching '$NE^' in the pandas series, It is returning None, hence resulting in False.
lst.str.contains('$NE^', case = False)
0 False
1 False
dtype: bool
This is because the regex expression you have written is incorrect It should have been '^NE$' i.e. starts with 'NE' and ends with 'NE' but to match the first row your expression should be 'NE$' which means to end with NE while ignoring the case.
lst.str.contains('NE$', case = False)
0 True
1 False
dtype: bool
Upvotes: 1
Reputation: 46
After you join your dictionary keys and values, you'll get a string.
all_direction = '^Southeast$|$NE^|$SW^|$N^|$S^|$W^|^East$|^Northeast$|$SE^|$NW^|^South$|$E^|^Southwest$|^North$|^West$|^Northwest$'
lst.str.contains(all_direction) will check if the string 'all_direction' is present in the list 'lst'.
lst = 0 57 Freeport Crescent NE
1 890 4 Avenue SW
lst.str.contains(all_direction, case = False)
0 False
1 False
dtype: bool
lst doesn't contain the string all_direction. That's why it's returning False.
lst.str.contains('e', case=False)
0 True
1 True
dtype: bool
All the elements of lst contain the letter 'e'.
Upvotes: 1
Reputation: 437
I believe it might be because of the symbols.
Is it necessary to have $
, ^
?
because without those symbols I believe the code would work.
Upvotes: 1