Reputation: 383

How to get unmatching data from 2 dataframes based on one column. (Pandas)

I have 2 data frames sample output is here My code for getting those and formatting the date column is here

First df:

csv_data_df = pd.read_csv(os.path.join(path_to_data+'\\Data\\',appendedfile)) csv_data_df['Date_Formatted'] = pd.to_datetime(csv_data_df['DATE']).dt.strftime('%Y-%m-%d') csv_data_df.head(3)

second df :

new_Data_df = pd.read_csv(io.StringIO(response.decode('utf-8')))
new_Data_df['Date_Formatted'] = 
pd.to_datetime(new_Data_df['DATE']).dt.strftime('%Y-%m-%d')
new_Data_df.head(3)`

I want to construct third dataframe where only the rows with un-matching dates from second dataframe needs to go in third one. Is there any method to do that. The date formatted column you can see in the screenshot.

Upvotes: 1

Answers (1)

EFT

Reputation: 2369

You could set the index of both dataframes to your desired join column, then use df1.combine_first(df2). For your specific example here, that could look like the below line.

csv_data_df.set_index('Date_Formatted').combine_first(new_Data_df.set_index('Date_Formatted')).reset_index()

Ex:

df = pd.DataFrame(np.random.randn(5, 3), columns=list('abc'), index=list(range(1, 6)))

df2 = pd.DataFrame(np.random.randn(8, 3), columns=list('abc'))

df
Out[10]: 
          a         b         c
1 -1.357517 -0.925239  0.974483
2  0.362472 -1.881582  1.263237
3  0.785508  0.227835 -0.604377
4 -0.386585 -0.511583  3.080297
5  0.660516 -1.393421  1.363900

df2
Out[11]: 
          a         b         c
0  1.732251 -1.977803  0.720292
1  0.048229  1.125277  1.016083
2 -1.684013  2.136061  0.553824
3 -0.022957  1.237249  0.236923
4 -0.998079  1.714126  1.291391
5  0.955464 -0.049673  1.629146
6  0.865864  1.137120  1.117207
7 -0.126944  1.003784 -0.180811

df.combine_first(df2)
Out[13]: 
          a         b         c
0  1.732251 -1.977803  0.720292
1 -1.357517 -0.925239  0.974483
2  0.362472 -1.881582  1.263237
3  0.785508  0.227835 -0.604377
4 -0.386585 -0.511583  3.080297
5  0.660516 -1.393421  1.363900
6  0.865864  1.137120  1.117207
7 -0.126944  1.003784 -0.180811

Upvotes: 1

How to get unmatching data from 2 dataframes based on one column. (Pandas)

Answers (1)

Related Questions