Merge Two CSV Files With Cases Using Python

Question

I need to merge two files of data together following some rules in the process. I'm slightly aware of how to use pandas to create and merge csv's, but I do not know how to do so following certain rules.

I have two files as exampled:

actual_data.csv

type    region_1    region_2    date         data
aw      west        2           10/01/2017   0.9318274971234
hss     east        3           05/12/2015   0.7136487234972
.       .           .           .            .
.       .           .           .            .
.       .           .           .            .

forecast_data.csv

type    region_1    region_2    date         data
jad     north       22          11/13/2025   0.71283741932
js      east        3           04/16/2023   0.16238471239
.       .           .           .            .
.       .           .           .            .
.       .           .           .            .

I need to merge these files following these rules:

Merge actual_data.csv with forecast_data.csv
If a date for actual_data.csv is missing, use forecast_data.csv for that date
Always use actual_data.csv over forecast_data.csv when both are available for that specific month

it&#39;s-yer-boy-chet · Accepted Answer

You need to do a left join of actual onto forecast (this based on your requirement that missing actual data should be filled, if you expect missing forecast data you would want to do an outer join).

import pandas as pd

act_df = pd.read_csv(actual_data.csv)
fore_df = pd.read_csv(forecast_data.csv)

res = fore_df.merge(act_df, on=['type', 'region_1', 'region_2'], how='left', suffixes=('_fore', '_act'))

At this point res will have an index of the merge columns ['type', region_1', region_2'], 2 date columns (date_fore, date_act), and 2 data columns (data_fore, date_act). From there you can fill the null values of the actual date with the forecast date and then collapse it to a single column if you like:

res['date'] = res['date_act'].fillna(res['date_fore'])
res.drop(['date_fore', 'date_act'], axis=1, inplace=True)

For the data it's a similar operation:

res['data'] = res['data_act'].fillna(res['data_fore'])
res.drop(['data_fore', 'data_act'], axis=1, inplace=True)

Merge Two CSV Files With Cases Using Python

Answers (2)

Related Questions