rahul soni
rahul soni

Reputation: 3

using str.contains() in python, when it matches string perfectly and still not getting output

I am using str.contains() for searching movie name from my dataframe and getting no output, but when I have partial string it is giving output correctly. What i want is how to make this code snippet work correctly for both partial and full string matching.

using contains on partial string, if I use only '(Volume 1)' in minList i get correct output or the one shown below

minList = ['Star Wars: Clone Wars']
for k in minList:
    print(df[df.name.str.contains(k,case=False,na=False)]["name"])

3208 Star Wars: Clone Wars (Volume 1) Name: name, dtype: object

using contains on full string

minList = ['Star Wars: Clone Wars (Volume 1)']
for k in minList:
    print(df[df.name.str.contains(k,case=False,na=False)]["name"])

and no output

Series([], Name: name, dtype: object)

tried using query as well()

minList = ['Star Wars: Clone Wars (Volume 1)']
for k in minList:
    print(df.query('name.str.contains("' + k + '")',engine='python')['name'])

But no output

Series([], Name: name, dtype: object)

Upvotes: 0

Views: 2647

Answers (1)

SeaBean
SeaBean

Reputation: 23237

Add an argument regex=False to to the str.contains() call.

str.contains() takes the first parameter as regex (regular expression) by default. So parenthesis is treated as regex symbols and does not match parenthesis literally.

Demo

data = {'name': ['Star Wars: Clone Wars (Volume 1)', 'Other strings']}
df = pd.DataFrame(data)
print(df)

Output:
                               name
0  Star Wars: Clone Wars (Volume 1)
1                     Other strings

minList = ['Star Wars: Clone Wars (Volume 1)']
for k in minList:
    print(df[df.name.str.contains(k,case=False,na=False, regex=False)]["name"])

Output:   # String extracted successully.

0    Star Wars: Clone Wars (Volume 1)
Name: name, dtype: object

If you want to match the string with regex=True, you need to modify the string passed as first parameter to:

minList = [r'Star Wars: Clone Wars \(Volume 1\)']

Demo

minList = [r'Star Wars: Clone Wars \(Volume 1\)']
for k in minList:
    print(df[df.name.str.contains(k,case=False,na=False)]["name"])

Output:       # String matched successfully

0    Star Wars: Clone Wars (Volume 1)
Name: name, dtype: object

Here we use \( instead of just ( and \) instead of just ). We also used raw string r'....' to quote the whole string so that we don't need to use double slash for the string which is regarded as regex.

Upvotes: 2

Related Questions