Reputation: 310
I have DataFrame that in one of columns contains lists of strings, like this one:
print(df_1.lists)
out:
0 [Pucku, Byłam, Miruś, Funkcjonariusze]
1 [Greger, Pytam, Jana, Dopóki, Wiary]
2 [Baborowa, Chcę, Innym, Baborowie]
etc
And I have another DataFrame, that in a Series contains words:
print(df_2.check)
out:
0 Olszany
1 Pucków
2 Baborowa
3 Studzionki
4 Pytam
5 Lasowice
etc
I want to take each row of df_1.lists
and check if list contains any of words from df_2.check
. If it contains, then I'd like to assign those contained words to a column in df_1.lists
. How to do it?
[EDIT] I tried df_1.lists.apply(lambda x:[list(set(df_2.checks.str.extract(r"("+ i +r")").dropna().values)) for i in x]) but this is waaaay too slow.
Upvotes: 4
Views: 2361
Reputation: 862641
Use nested list comprehension:
df_1['new'] = [[y for y in x if y in df_2['check'].values] for x in df_1['lists']]
Or get intersection
between set and list for each value:
df_1['new'] = [list(set(x).intersection(df_2['check'])) for x in df_1['lists']]
Similar intersection
between sets:
s = set(df_2['check'])
df_1['new'] = [list(set(x).intersection(s)) for x in df_1['lists']]
Upvotes: 6