Finding intersection between two dataframes iteratively

Question

I have the following two dataframes and would like to find their intersection.

df1 = pd.DataFrame({"0": [1524, 8788, 9899, 27172],
                   "1": [1333, 4476, 78783, 90832],
                   "2": [2021, 2022, 34522, 38479]})

print(df1)

      0      1      2
0   1524   1333   2021
1   8788   4476   2022
2   9899  78783  34522
3  27172  90832  38479

df2 is a list type with one column '0' which looks like this:

          0
[1123, 2021, 1333, 6636], 
[1245, 2022, 4477, 0], 
[1524, 2023, 1, 27172], 
[2021, 2023, 90832, 38479]

Expected output should be intersection of df1 and df2, for example:

df3 = [2021, 1333],
      [2022],
      [0],
      [90832, 38479]

What I read so far relates to finding intersection for a single list, and not two dataframes with different data types. My end goal is to compute precision which is the intersection of df1 and df2 divide by the total number of my recommendations from df1 , which is 3. Additional note from comments below: The rows are aligned and would be compared pairwise. [0] in df3 does not appear anywhere but could work in case the intersection is 0.

user7864386 · Accepted Answer

Given

df1:

       0      1      2
0   1524   1333   2021
1   8788   4476   2022
2   9899  78783  34522
3  27172  90832  38479

and df2:

                            0
0    [1123, 2021, 1333, 6636]
1       [1245, 2022, 4477, 0]
2      [1524, 2023, 1, 27172]
3  [2021, 2023, 90832, 38479]

You can use set.intersection inside list comprehension:

df1_lst = df1.to_numpy().tolist()
df2_lst = df2.to_numpy().tolist()
df3 = pd.DataFrame([[list(set(i).intersection(j[0]))] for i,j in zip(df1_lst, df2_lst)], columns=['col'])

Output:

              col
0    [1333, 2021]
1          [2022]
2              []
3  [90832, 38479]

Finding intersection between two dataframes iteratively

Answers (2)

Related Questions