user13469785
user13469785

Reputation:

Python : compare two list of strings in a Pandas dataframe

I would like to check if each word in the labels list exist in each list in the column 'bigrams'.

And if one these words exist in the bigram list, I would like to replace the label none by the word that exists.

I tried to write two consecutive for loop but it doesn't work. I also tried a comprehension list.

How can I do ?

enter image description here

Upvotes: 1

Views: 521

Answers (1)

MrNobody33
MrNobody33

Reputation: 6483

You can use pd.Series.str.extract

df = pd.DataFrame({'bgrams': [['hello','goodbye'],['dog','cat'],['cow']], 'label':[None,None,None]})
df
#             bgrams label
#0  [hello, goodbye]  None
#1        [dog, cat]  None
#2             [cow]  None

labels=['cat','goodbye']

regex='('+'|'.join(labels)+')'

df['label']=df.bgrams.astype(str).str.extract(regex)

Output:

df
             bgrams    label
0  [hello, goodbye]  goodbye
1        [dog, cat]      cat
2             [cow]      NaN

For multiple matches, you can use pd.Series.str.findall:

df = pd.DataFrame({'bgrams': [['hello','goodbye','cat'],['dog','cat'],['cow']], 'label':[None,None,None]})
df
#             bgrams label
#0  [hello, goodbye, cat]  None
#1        [dog, cat]  None
#2             [cow]  None

labels=['cat','goodbye']

regex='('+'|'.join(labels)+')'

df['label']=df.bgrams.astype(str).str.findall(regex)

Output:

df
                  bgrams           label
0  [hello, goodbye, cat]  [goodbye, cat]
1             [dog, cat]           [cat]
2                  [cow]              []

Upvotes: 0

Related Questions