Reputation: 203
I have two columns in a pandas DataFrame: authors
and name
. I want to create a third column: a cell's value is True
if the corresponding row's name
is contained in the corresponding row's authors
, and False
otherwise.
So the result will look like the picture below.
I have tried .str.contains()
, .str.extract()
, .str.find()
, .where()
, and etc.
But Python returns an error: 'Series' objects are mutable, thus they cannot be hashed.
Does anyone know how to create the third column in Python?
Upvotes: 3
Views: 1006
Reputation: 210982
Here is a vectorized solution, which uses Series.str.split() and DataFrame.isin() methods:
df['Check'] = df.Authors.str.split(r'\s*,\s*', expand=True).isin(df.Name).any(1)
Demo:
In [126]: df
Out[126]:
Authors Name
0 S.Rogers, T. Williams H. Tov
1 M. White, J.Black J.Black
In [127]: df.Authors.str.split(r'\s*,\s*', expand=True)
Out[127]:
0 1
0 S.Rogers T. Williams
1 M. White J.Black
In [128]: df.Authors.str.split(r'\s*,\s*', expand=True).isin(df.Name)
Out[128]:
0 1
0 False False
1 False True
In [130]: df['Check'] = df.Authors.str.split(r'\s*,\s*', expand=True).isin(df.Name).any(1)
In [131]: df
Out[131]:
Authors Name Check
0 S.Rogers, T. Williams H. Tov False
1 M. White, J.Black J.Black True
Upvotes: 0
Reputation: 394459
IIUC then you can apply
a lambda row-wise to check if the Name string is present in Authors:
df['Check'] = df.apply(lambda row: row['Name'] in row['Authors'], axis=1)
should work
You can't use str.contains()
, str.extract()
, str.find()
, or where()
here because you're trying to compare row-wise and those methods expect a fixed list or pattern for the searching criteria.
Upvotes: 4