Reputation: 1045
I need to merge two files of data together following some rules in the process. I'm slightly aware of how to use pandas to create and merge csv's, but I do not know how to do so following certain rules.
I have two files as exampled:
actual_data.csv
type region_1 region_2 date data
aw west 2 10/01/2017 0.9318274971234
hss east 3 05/12/2015 0.7136487234972
. . . . .
. . . . .
. . . . .
forecast_data.csv
type region_1 region_2 date data
jad north 22 11/13/2025 0.71283741932
js east 3 04/16/2023 0.16238471239
. . . . .
. . . . .
. . . . .
I need to merge these files following these rules:
Upvotes: 3
Views: 503
Reputation: 2017
You need to do a left join of actual onto forecast (this based on your requirement that missing actual data should be filled, if you expect missing forecast data you would want to do an outer join).
import pandas as pd
act_df = pd.read_csv(actual_data.csv)
fore_df = pd.read_csv(forecast_data.csv)
res = fore_df.merge(act_df, on=['type', 'region_1', 'region_2'], how='left', suffixes=('_fore', '_act'))
At this point res will have an index of the merge columns ['type', region_1', region_2']
, 2 date columns (date_fore
, date_act
), and 2 data columns (data_fore
, date_act
). From there you can fill the null values of the actual date with the forecast date and then collapse it to a single column if you like:
res['date'] = res['date_act'].fillna(res['date_fore'])
res.drop(['date_fore', 'date_act'], axis=1, inplace=True)
For the data it's a similar operation:
res['data'] = res['data_act'].fillna(res['data_fore'])
res.drop(['data_fore', 'data_act'], axis=1, inplace=True)
Upvotes: 3
Reputation: 3519
combine_first
.import numpy as np
import pandas as pd
df1 = pd.DataFrame({'a':[1,2,np.nan], 'b':['x','y','z']})
df2 = pd.DataFrame({'a':[1,3,4], 'b':['a','b','c']})
df = pd.DataFrame()
for col in df1.columns:
df[col] = df1[col].combine_first(df2[col])
df
output:
# df1
a b
0 1.0 x
1 2.0 y
2 NaN z
#df2
a b
0 1 a
1 3 b
2 4 c
#df
a b
0 1.0 x
1 2.0 y
2 4.0 z
Upvotes: 1