Reputation:
I would like to check if each word in the labels list exist in each list in the column 'bigrams'.
And if one these words exist in the bigram list, I would like to replace the label none by the word that exists.
I tried to write two consecutive for loop but it doesn't work. I also tried a comprehension list.
How can I do ?
Upvotes: 1
Views: 521
Reputation: 6483
You can use pd.Series.str.extract
df = pd.DataFrame({'bgrams': [['hello','goodbye'],['dog','cat'],['cow']], 'label':[None,None,None]})
df
# bgrams label
#0 [hello, goodbye] None
#1 [dog, cat] None
#2 [cow] None
labels=['cat','goodbye']
regex='('+'|'.join(labels)+')'
df['label']=df.bgrams.astype(str).str.extract(regex)
Output:
df
bgrams label
0 [hello, goodbye] goodbye
1 [dog, cat] cat
2 [cow] NaN
For multiple matches, you can use pd.Series.str.findall
:
df = pd.DataFrame({'bgrams': [['hello','goodbye','cat'],['dog','cat'],['cow']], 'label':[None,None,None]})
df
# bgrams label
#0 [hello, goodbye, cat] None
#1 [dog, cat] None
#2 [cow] None
labels=['cat','goodbye']
regex='('+'|'.join(labels)+')'
df['label']=df.bgrams.astype(str).str.findall(regex)
Output:
df
bgrams label
0 [hello, goodbye, cat] [goodbye, cat]
1 [dog, cat] [cat]
2 [cow] []
Upvotes: 0