Reputation: 434
I would like to use something similar to the function discussed in this topic: Using the isin() function on grouped data . However with two DataFrames with different lengths and both grouped by variable.
The functions should group column Dev_stage
by year in both DataFrames, compare grouped data and provide data, which are not in one of these grouped DataFrames.
My snippet:
>>> df1
Out:
Dev_stage Year
0 1 1989
1 2 1989
2 2 1989
3 3 1989
4 1 1990
5 1 1990
6 3 1990
>>> df2
Out:
Dev_stage Year
0 1 1989
1 2 1989
2 2 1990
3 1 1990
4 3 1990
I was trying something like this:
out = lambda x, y: x[~x['Dev_stage'].isin(y['Dev_stage'])]
out(df1.groupby('Year'), df2.groupby('Year'))
But also get the error: 'SeriesGroupBy' object has no attribute 'isin'
. I thought that lambda will solve this one.
Expecting something like this:
out:
Dev_stage Year
3 3 1989
Thanks!
Upvotes: 2
Views: 467
Reputation: 34086
Use df.merge
with indicator=True
:
In [958]: out = df1.merge(df2, how='left', indicator=True).query('_merge != "both"').drop('_merge', 1)
In [959]: out
Out[959]:
Dev_stage Year
3 3 1989
Upvotes: 1
Reputation: 30050
IIUC, you can use inner merge to keep the same values among multiple columns, then filter them out
out = df1[~df1.index.isin(df1.reset_index().merge(df2, how='inner')['index'])]
print(out)
Dev_stage Year
3 3 1989
6 3 1990
Upvotes: 2