MikolajM
MikolajM

Reputation: 310

Pandas- How to check if list of strings in DF row contains any of strings in series in another DF?

I have DataFrame that in one of columns contains lists of strings, like this one:

print(df_1.lists)

out:

0      [Pucku, Byłam, Miruś, Funkcjonariusze]
1      [Greger, Pytam, Jana, Dopóki, Wiary]
2      [Baborowa, Chcę, Innym, Baborowie]
etc

And I have another DataFrame, that in a Series contains words:

print(df_2.check)

out:

0                   Olszany
1                    Pucków
2                  Baborowa
3                Studzionki
4                     Pytam
5                  Lasowice
etc

I want to take each row of df_1.lists and check if list contains any of words from df_2.check. If it contains, then I'd like to assign those contained words to a column in df_1.lists. How to do it?

[EDIT] I tried df_1.lists.apply(lambda x:[list(set(df_2.checks.str.extract(r"("+ i +r")").dropna().values)) for i in x]) but this is waaaay too slow.

Upvotes: 4

Views: 2361

Answers (1)

jezrael
jezrael

Reputation: 862641

Use nested list comprehension:

df_1['new'] = [[y for y in x if y in df_2['check'].values] for x in df_1['lists']]

Or get intersection between set and list for each value:

df_1['new'] = [list(set(x).intersection(df_2['check'])) for x in df_1['lists']]

Similar intersection between sets:

s = set(df_2['check'])
df_1['new'] = [list(set(x).intersection(s)) for x in df_1['lists']]

Upvotes: 6

Related Questions