Reputation: 140
I have the following two dataframes and would like to find their intersection.
df1 = pd.DataFrame({"0": [1524, 8788, 9899, 27172],
"1": [1333, 4476, 78783, 90832],
"2": [2021, 2022, 34522, 38479]})
print(df1)
0 1 2
0 1524 1333 2021
1 8788 4476 2022
2 9899 78783 34522
3 27172 90832 38479
df2
is a list type with one column '0' which looks like this:
0
[1123, 2021, 1333, 6636],
[1245, 2022, 4477, 0],
[1524, 2023, 1, 27172],
[2021, 2023, 90832, 38479]
Expected output should be intersection of df1 and df2, for example:
df3 = [2021, 1333],
[2022],
[0],
[90832, 38479]
What I read so far relates to finding intersection for a single list, and not two dataframes with different data types. My end goal is to compute precision which is the intersection of df1 and df2 divide by the total number of my recommendations from df1
, which is 3.
Additional note from comments below:
The rows are aligned and would be compared pairwise.
[0]
in df3 does not appear anywhere but could work in case the intersection is 0.
Upvotes: 1
Views: 123
Reputation:
Given
df1
:
0 1 2
0 1524 1333 2021
1 8788 4476 2022
2 9899 78783 34522
3 27172 90832 38479
and df2
:
0
0 [1123, 2021, 1333, 6636]
1 [1245, 2022, 4477, 0]
2 [1524, 2023, 1, 27172]
3 [2021, 2023, 90832, 38479]
You can use set.intersection
inside list comprehension:
df1_lst = df1.to_numpy().tolist()
df2_lst = df2.to_numpy().tolist()
df3 = pd.DataFrame([[list(set(i).intersection(j[0]))] for i,j in zip(df1_lst, df2_lst)], columns=['col'])
Output:
col
0 [1333, 2021]
1 [2022]
2 []
3 [90832, 38479]
Upvotes: 2
Reputation: 26676
lst=[[1123, 2021, 1333, 6636],
[1245, 2022, 4477, 0],
[1524, 2023, 1, 27172],
[2021, 2023, 90832, 38479]]
s=[set(x)for x in lst]#put list in set
s1=df1.agg(set,1).to_list()#make list of list of row values
[list(x.intersection(y)) for x, y in zip(s, s1)]
out
[[1333, 2021], [2022], [], [90832, 38479]]
Upvotes: 1