Reputation: 110153
What is the correct way to check if a string is contained in field in pandas? For example, I have:
np.where('DIGITAL_SOURCE' in df['file_name'], 1, 0)
But I get the following complaint from Pandas:
TypeError: 'Series' objects are mutable, thus they cannot be hashed
What would be the proper way to do substr in str
? I believe the correct answer is using str.contains
but was having some trouble with the syntax.
Upvotes: 1
Views: 296
Reputation: 10863
you can also apply a lambda such that:
df['new_column'] = df.apply(lambda x: 1 if 'DIGITAL_SOURCE' in x['file_name'] else 0, axis=1 )
example:
df = pd.DataFrame({"LOCATION":["USA","USA","USA","USA","JAPAN","JAPAN"],"file_name":["DIGITAL","DIGITAL","DIGITAL","DIGITAL","DIGITAL_SOURCE","DIGITAL_SOURCE"]})
LOCATION file_name
0 USA DIGITAL
1 USA DIGITAL
2 USA DIGITAL
3 USA DIGITAL
4 JAPAN DIGITAL_SOURCE
5 JAPAN DIGITAL_SOURCE
df['new_cl'] = df.apply(lambda x: 1 if 'DIGITAL_SOURCE' in x['file_name'] else 0, axis=1 )
LOCATION file_name new_cl
0 USA DIGITAL 0
1 USA DIGITAL 0
2 USA DIGITAL 0
3 USA DIGITAL 0
4 JAPAN DIGITAL_SOURCE 1
5 JAPAN DIGITAL_SOURCE 1
Upvotes: -1
Reputation: 195428
As stated in the comments, you can use .str.contains
(note the regex=False
, to not treat the string as regular expression):
df = pd.DataFrame({'file_name': ['DIGITAL_SOURCE', 'Other1', 'Other3']})
df['contains'] = df['file_name'].str.contains('DIGITAL_SOURCE', regex=False).astype(int)
print(df)
Prints:
file_name contains
0 DIGITAL_SOURCE 1
1 Other1 0
2 Other3 0
Upvotes: 3
Reputation: 323226
You should do isin
np.where( df['file_name'].isin(['DIGITAL_SOURCE']), 1, 0)
#df['file_name'].isin(['DIGITAL_SOURCE']).astype(int)
Upvotes: 1