Reputation: 383
I have 2 data frames sample output is here
My code for getting those and formatting the date column is here
First df:
csv_data_df = pd.read_csv(os.path.join(path_to_data+'\\Data\\',appendedfile))
csv_data_df['Date_Formatted'] = pd.to_datetime(csv_data_df['DATE']).dt.strftime('%Y-%m-%d')
csv_data_df.head(3)
second df :
new_Data_df = pd.read_csv(io.StringIO(response.decode('utf-8')))
new_Data_df['Date_Formatted'] =
pd.to_datetime(new_Data_df['DATE']).dt.strftime('%Y-%m-%d')
new_Data_df.head(3)`
I want to construct third dataframe where only the rows with un-matching dates from second dataframe needs to go in third one. Is there any method to do that. The date formatted column you can see in the screenshot.
Upvotes: 1
Views: 538
Reputation: 2369
You could set the index of both dataframes to your desired join column, then
use df1.combine_first(df2)
. For your specific example here, that could look like the below line.
csv_data_df.set_index('Date_Formatted').combine_first(new_Data_df.set_index('Date_Formatted')).reset_index()
Ex:
df = pd.DataFrame(np.random.randn(5, 3), columns=list('abc'), index=list(range(1, 6)))
df2 = pd.DataFrame(np.random.randn(8, 3), columns=list('abc'))
df
Out[10]:
a b c
1 -1.357517 -0.925239 0.974483
2 0.362472 -1.881582 1.263237
3 0.785508 0.227835 -0.604377
4 -0.386585 -0.511583 3.080297
5 0.660516 -1.393421 1.363900
df2
Out[11]:
a b c
0 1.732251 -1.977803 0.720292
1 0.048229 1.125277 1.016083
2 -1.684013 2.136061 0.553824
3 -0.022957 1.237249 0.236923
4 -0.998079 1.714126 1.291391
5 0.955464 -0.049673 1.629146
6 0.865864 1.137120 1.117207
7 -0.126944 1.003784 -0.180811
df.combine_first(df2)
Out[13]:
a b c
0 1.732251 -1.977803 0.720292
1 -1.357517 -0.925239 0.974483
2 0.362472 -1.881582 1.263237
3 0.785508 0.227835 -0.604377
4 -0.386585 -0.511583 3.080297
5 0.660516 -1.393421 1.363900
6 0.865864 1.137120 1.117207
7 -0.126944 1.003784 -0.180811
Upvotes: 1