Reputation: 340
I have a data frame in pandas having two columns where each row is a list of strings, how would it be possible to check if there is word match(es) in these two columns on a unique row(flag column is the desired output)
A B flag
hello,hi,bye bye, also 1
but, as well see, pandas 0
I have tried
df['A'].str.contains(df['B'])
but I got this error
TypeError: 'Series' objects are mutable, thus they cannot be hashed
Upvotes: 2
Views: 3804
Reputation: 863701
You can convert each value to separately words by split and set
s and check intersection by &
, then convert values to boolean - empty sets are converted to False
s and last convert it to int
s - Falses
are 0
s and True
s are 1
s.
zipped = zip(df['A'], df['B'])
df['flag'] = [int(bool(set(a.split(',')) & set(b.split(',')))) for a, b in zipped]
print (df)
A B flag
0 hello,hi,bye bye,also 1
1 but,as well see,pandas 0
Similar solution:
df['flag'] = np.array([set(a.split(',')) & set(b.split(',')) for a, b in zipped]).astype(bool).astype(int)
print (df)
A B flag
0 hello,hi,bye bye, also 1
1 but,as well see, pandas 0
EDIT: There is possible some whitespaces before ,
, so add map
with str.strip
and also remove empty strings with filter
:
df = pd.DataFrame({'A': ['hello,hi,bye', 'but,,,as well'],
'B': ['bye ,,, also', 'see,,,pandas']})
print (df)
A B
0 hello,hi,bye bye ,,, also
1 but,,,as well see,,,pandas
zipped = zip(df['A'], df['B'])
def setify(x):
return set(map(str.strip, filter(None, x.split(','))))
df['flag'] = [int(bool(setify(a) & setify(b))) for a, b in zipped]
print (df)
A B flag
0 hello,hi,bye bye ,,, also 1
1 but,,,as well see,,,pandas 0
Upvotes: 3