Reputation: 109
I have 2 dataframes which are "exactly" the same. The difference between them is DF1 has 1000 rows and DF2 has 950rows. 50 Rows were dropped but want to know what. Essentially DF2 is a subset of DF1 but I need to know what was dropped by another service from elsewhere.
It would be easiest to return a 3rd dataframe(DF3) where it showed the ones that are dropped(50).
DF3(50 rows x 4 columns) = DF1 (1000 rows x 4 columns) - DF2 (950 rows x 4 columns)
The index is the UniqueID.
Thank you!!
Upvotes: 2
Views: 457
Reputation: 120549
Essentially DF2 is a subset of DF1
You're right so you can use difference
from sets
:
>>> df1.loc[df1.index.difference(df2.index)]
Example:
>>> df1
A
0 0.712755
1 0.400005
2 0.958937
3 0.112367
4 0.230177
>>> df2
A
0 0.712755
1 0.400005
4 0.230177
>>> df1.loc[df1.index.difference(df2.index)]
A
2 0.958937
3 0.112367
Upvotes: 1