Luckasino
Luckasino

Reputation: 434

Using the isin() function on grouped data from two dataframes

I would like to use something similar to the function discussed in this topic: Using the isin() function on grouped data . However with two DataFrames with different lengths and both grouped by variable.

The functions should group column Dev_stage by year in both DataFrames, compare grouped data and provide data, which are not in one of these grouped DataFrames.

My snippet:

>>> df1
Out:
    Dev_stage Year
0   1         1989
1   2         1989
2   2         1989
3   3         1989
4   1         1990
5   1         1990
6   3         1990

>>> df2
Out:
    Dev_stage Year
0   1         1989
1   2         1989
2   2         1990
3   1         1990
4   3         1990

I was trying something like this:

out = lambda x, y: x[~x['Dev_stage'].isin(y['Dev_stage'])]
out(df1.groupby('Year'), df2.groupby('Year'))

But also get the error: 'SeriesGroupBy' object has no attribute 'isin'. I thought that lambda will solve this one.

Expecting something like this:

out:   
    Dev_stage Year
3   3         1989

Thanks!

Upvotes: 2

Views: 467

Answers (2)

Mayank Porwal
Mayank Porwal

Reputation: 34086

Use df.merge with indicator=True:

In [958]: out = df1.merge(df2, how='left', indicator=True).query('_merge != "both"').drop('_merge', 1)

In [959]: out
Out[959]: 
   Dev_stage  Year
3          3  1989

Upvotes: 1

Ynjxsjmh
Ynjxsjmh

Reputation: 30050

IIUC, you can use inner merge to keep the same values among multiple columns, then filter them out

out = df1[~df1.index.isin(df1.reset_index().merge(df2, how='inner')['index'])]
print(out)

   Dev_stage  Year
3          3  1989
6          3  1990

Upvotes: 2

Related Questions