Caroline
Caroline

Reputation: 167

Find difference in two different data-frames

I have two data frame df1 is 26000 rows, df2 is 25000 rows. Im trying to find data points that are in d1 but not in d2, vice versa. This is what I wrote (below code) but when I cross check it shows me shared data point

import pandas as pd 
df1 = pd.read_csv('df1.csv')
df2 = pd.read_csv('df2.csv')

df_join = pd.concat([df1,df2], axis = 1).drop_duplicates(keep = FALSE)

only_df1 = df_join.loc[df_join[df2.columns.to_list()].isnull().all(axis = 1), df1.columns.to_list()]

Order doesn't matter just want to know whether that data point exist in one or the other data frame.

Upvotes: 0

Views: 63

Answers (1)

user2246849
user2246849

Reputation: 4407

With two dfs:

import pandas as pd

df1 = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [1, 1, 1, 1, 1]})
df2 = pd.DataFrame({'a': [2, 3, 4, 5, 6], 'b': [1, 1, 1, 1, 1]})
print(df1)
print(df2)
   a  b
0  1  1
1  2  1
2  3  1
3  4  1
4  5  1
   a  b
0  2  1
1  3  1
2  4  1
3  5  1
4  6  1

You could do:

df_differences = df1.merge(df2, how='outer', indicator=True)

print(df_differences)

Result:

   a  b      _merge
0  1  1   left_only
1  2  1        both
2  3  1        both
3  4  1        both
4  5  1        both
5  6  1  right_only

And then:

only_df1 = df_differences[df_differences['_merge'].eq('left_only')].drop(columns=['_merge'])
only_df2 = df_differences[df_differences['_merge'].eq('right_only')].drop(columns=['_merge'])

print(only_df1)
print()
print(only_df2)
   a  b
0  1  1

   a  b
5  6  1

Upvotes: 1

Related Questions