Xin
Xin

Reputation: 674

Why is str.contains() not returning the correct results?

I have the following series:

import pandas as pd

lst = pd.Series(['57 Freeport Crescent NE',  '890 4 Avenue SW'])

And I have the following dictionary in which I joined together the keys and values since I am interested in searching all of them through my list

direction = {
        '^Northwest$': '$NW^',
        '^Northeast$': '$NE^',
        '^Southeast$': '$SE^',
        '^Southwest$': '$SW^',
        '^North$': '$N^',
        '^East$': '$E^',
        "^South$": '$S^',
        "^West$": "$W^"}

all_direction = direction.keys() | direction.values()
all_direction = '|'.join(all_direction)

My question is why doesn't lst.str.contains(all_direction, case = False) return both False, but instead of returning both True since they contain NE and SW?

Upvotes: 0

Views: 1163

Answers (4)

Dani Shubin
Dani Shubin

Reputation: 34

I think it is because pd.Series(['57 Freeport Crescent NE', '890 4 Avenue SW']) results in:

output:

0    57 Freeport Crescent NE
1            890 4 Avenue SW
dtype: object

I don't have much experience using this but it seems that you can use this as a list. SW and NE will not be in that list because in each index there is a full string instead of separate. I don't know if this answers your question though...

Upvotes: 1

Nikhil Khandelwal
Nikhil Khandelwal

Reputation: 124

Instead of taking the whole regex pattern, I've taken the minimum which can match. Here, when try searching '$NE^' in the pandas series, It is returning None, hence resulting in False.

lst.str.contains('$NE^', case = False)

0    False
1    False
dtype: bool

This is because the regex expression you have written is incorrect It should have been '^NE$' i.e. starts with 'NE' and ends with 'NE' but to match the first row your expression should be 'NE$' which means to end with NE while ignoring the case.

lst.str.contains('NE$', case = False) 

0     True
1    False
dtype: bool

Upvotes: 1

Geetha Rangaswamaiah
Geetha Rangaswamaiah

Reputation: 46

After you join your dictionary keys and values, you'll get a string.

all_direction = '^Southeast$|$NE^|$SW^|$N^|$S^|$W^|^East$|^Northeast$|$SE^|$NW^|^South$|$E^|^Southwest$|^North$|^West$|^Northwest$'

lst.str.contains(all_direction) will check if the string 'all_direction' is present in the list 'lst'.

lst = 0    57 Freeport Crescent NE
      1    890 4 Avenue SW

lst.str.contains(all_direction, case = False)
0    False
1    False
dtype: bool

lst doesn't contain the string all_direction. That's why it's returning False.

lst.str.contains('e', case=False)
0    True
1    True
dtype: bool

All the elements of lst contain the letter 'e'.

Upvotes: 1

sammy
sammy

Reputation: 437

I believe it might be because of the symbols.

Is it necessary to have $, ^?

because without those symbols I believe the code would work.

Upvotes: 1

Related Questions