Find difference in two different data-frames

Question

I have two data frame df1 is 26000 rows, df2 is 25000 rows. Im trying to find data points that are in d1 but not in d2, vice versa. This is what I wrote (below code) but when I cross check it shows me shared data point

import pandas as pd 
df1 = pd.read_csv('df1.csv')
df2 = pd.read_csv('df2.csv')

df_join = pd.concat([df1,df2], axis = 1).drop_duplicates(keep = FALSE)

only_df1 = df_join.loc[df_join[df2.columns.to_list()].isnull().all(axis = 1), df1.columns.to_list()]

Order doesn't matter just want to know whether that data point exist in one or the other data frame.

user2246849 · Accepted Answer

With two dfs:

import pandas as pd

df1 = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [1, 1, 1, 1, 1]})
df2 = pd.DataFrame({'a': [2, 3, 4, 5, 6], 'b': [1, 1, 1, 1, 1]})
print(df1)
print(df2)

You could do:

df_differences = df1.merge(df2, how='outer', indicator=True)

print(df_differences)

Result:

   a  b      _merge
0  1  1   left_only
1  2  1        both
2  3  1        both
3  4  1        both
4  5  1        both
5  6  1  right_only

And then:

only_df1 = df_differences[df_differences['_merge'].eq('left_only')].drop(columns=['_merge'])
only_df2 = df_differences[df_differences['_merge'].eq('right_only')].drop(columns=['_merge'])

print(only_df1)
print()
print(only_df2)

Find difference in two different data-frames

Answers (1)

Related Questions