Reputation: 3
I am using str.contains() for searching movie name from my dataframe and getting no output, but when I have partial string it is giving output correctly. What i want is how to make this code snippet work correctly for both partial and full string matching.
using contains on partial string, if I use only '(Volume 1)' in minList i get correct output or the one shown below
minList = ['Star Wars: Clone Wars']
for k in minList:
print(df[df.name.str.contains(k,case=False,na=False)]["name"])
3208 Star Wars: Clone Wars (Volume 1) Name: name, dtype: object
using contains on full string
minList = ['Star Wars: Clone Wars (Volume 1)']
for k in minList:
print(df[df.name.str.contains(k,case=False,na=False)]["name"])
and no output
Series([], Name: name, dtype: object)
tried using query as well()
minList = ['Star Wars: Clone Wars (Volume 1)']
for k in minList:
print(df.query('name.str.contains("' + k + '")',engine='python')['name'])
But no output
Series([], Name: name, dtype: object)
Upvotes: 0
Views: 2647
Reputation: 23237
Add an argument regex=False
to to the str.contains()
call.
str.contains()
takes the first parameter as regex (regular expression) by default. So parenthesis is treated as regex symbols and does not match parenthesis literally.
data = {'name': ['Star Wars: Clone Wars (Volume 1)', 'Other strings']}
df = pd.DataFrame(data)
print(df)
Output:
name
0 Star Wars: Clone Wars (Volume 1)
1 Other strings
minList = ['Star Wars: Clone Wars (Volume 1)']
for k in minList:
print(df[df.name.str.contains(k,case=False,na=False, regex=False)]["name"])
Output: # String extracted successully.
0 Star Wars: Clone Wars (Volume 1)
Name: name, dtype: object
If you want to match the string with regex=True
, you need to modify the string passed as first parameter to:
minList = [r'Star Wars: Clone Wars \(Volume 1\)']
minList = [r'Star Wars: Clone Wars \(Volume 1\)']
for k in minList:
print(df[df.name.str.contains(k,case=False,na=False)]["name"])
Output: # String matched successfully
0 Star Wars: Clone Wars (Volume 1)
Name: name, dtype: object
Here we use \(
instead of just (
and \)
instead of just )
. We also used raw string r'....'
to quote the whole string so that we don't need to use double slash for the string which is regarded as regex.
Upvotes: 2