Reputation: 109

How would you tell which rows were dropped from the original dataframe and the current one?

I have 2 dataframes which are "exactly" the same. The difference between them is DF1 has 1000 rows and DF2 has 950rows. 50 Rows were dropped but want to know what. Essentially DF2 is a subset of DF1 but I need to know what was dropped by another service from elsewhere.

It would be easiest to return a 3rd dataframe(DF3) where it showed the ones that are dropped(50).

DF3(50 rows x 4 columns) = DF1 (1000 rows x 4 columns) - DF2 (950 rows x 4 columns)

The index is the UniqueID.

Thank you!!

Upvotes: 2

Answers (2)

Corralien

Reputation: 120549

Essentially DF2 is a subset of DF1

You're right so you can use difference from sets:

>>> df1.loc[df1.index.difference(df2.index)]

Example:

>>> df1
          A
0  0.712755
1  0.400005
2  0.958937
3  0.112367
4  0.230177

>>> df2
          A
0  0.712755
1  0.400005
4  0.230177

>>> df1.loc[df1.index.difference(df2.index)]
          A
2  0.958937
3  0.112367

Upvotes: 1

not_speshal

Reputation: 23166

Use isin on the index:

df3 = df1[~df1.index.isin(df2.index)]

Upvotes: 2

How would you tell which rows were dropped from the original dataframe and the current one?

Answers (2)

Related Questions