Reputation: 2616
Suppose I have three dataframes as below:
df1 = pd.DataFrame({'2016-01': ['A0', 'A1', 'A2', 'A3'],
'2016-02': ['B0', 'B1', 'B2', 'B3'],
'2016-03': ['C0', 'C1', 'C2', 'C3'],
'2016-04': ['D0', 'D1', 'D2', 'D3']},
index=['N1', 'N2', 'N3', 'N4'])
df2 = pd.DataFrame({'2017-01': ['A4', 'A5', 'A6'],
'2017-02': ['B4', 'B5', 'B6'],
'2017-03': ['C4', 'C5', 'C6'],
'2017-04': ['D4', 'D5', 'D6']},
index=['N1', 'N3', 'N5'])
df3 = pd.DataFrame({'2018-01': ['A7', 'A8', 'A9'],
'2018-02': ['B7', 'B8', 'B9'],
'2018-03': ['C7', 'C8', 'C9'],
'2018-04': ['D7', 'D8', 'D9']},
index=['N1', 'N5', 'N6'])
>>> df1
2016-01 2016-02 2016-03 2016-04
N1 A0 B0 C0 D0
N2 A1 B1 C1 D1
N3 A2 B2 C2 D2
N4 A3 B3 C3 D3
>>> df2
2017-01 2017-02 2017-03 2017-04
N1 A4 B4 C4 D4
N3 A5 B5 C5 D5
N5 A6 B6 C6 D6
>>> df3
2018-01 2018-02 2018-03 2018-04
N1 A7 B7 C7 D7
N5 A8 B8 C8 D8
N6 A9 B9 C9 D9
As you can see, df1
, df2
and df3
have overlapping (but not exactly equal set of) indexes. How can I merge these df
's into one final_df
that looks like this:
df4 = pd.DataFrame({
'2016-01': ['A0', 'A1', 'A2', 'A3', '', ''],
'2016-02': ['B0', 'B1', 'B2', 'B3', '', ''],
'2016-03': ['C0', 'C1', 'C2', 'C3', '', ''],
'2016-04': ['D0', 'D1', 'D2', 'D3', '', ''],
'2017-01': ['A4', '', 'A5', '', 'A6', ''],
'2017-02': ['B4', '', 'B5', '', 'B6', ''],
'2017-03': ['C4', '', 'C5', '', 'C6', ''],
'2017-04': ['D4', '', 'D5', '', 'D6', ''],
'2018-01': ['A7', '', '', '', 'A8', 'A9'],
'2018-02': ['B7', '', '', '', 'B8', 'B9'],
'2018-03': ['C7', '', '', '', 'C8', 'C9'],
'2018-04': ['D7', '', '', '', 'D8', 'D9']},
index=['N1', 'N2', 'N3', 'N4', 'N5', 'N6'])
>>> df4
2016-01 2016-02 2016-03 2016-04 2017-01 2017-02 2017-03 2017-04 2018-01 2018-02 2018-03 2018-04
N1 A0 B0 C0 D0 A4 B4 C4 D4 A7 B7 C7 D7
N2 A1 B1 C1 D1
N3 A2 B2 C2 D2 A5 B5 C5 D5
N4 A3 B3 C3 D3
N5 A6 B6 C6 D6 A8 B8 C8 D8
N6 A9 B9 C9 D9
I have tried something like this:
final_df = pd.DataFrame()
for cur_df in [df1, df2, df3]:
# To handle the indexes that aren't overlapping between final_df and cur_df
if not(set(cur_df.index).issubset(set(final_df.index))):
final_df = pd.concat([final_df, cur_df[~cur_df.index.isin(final_df.index)]]) # this is working, but only
but this is only taking care of the non-overlapping indexes and the output of final_df
is like this:
>>> final_df
2016-01 2016-02 2016-03 2016-04 2017-01 2017-02 2017-03 2017-04 2018-01 2018-02 2018-03 2018-04
N1 A0 B0 C0 D0 NaN NaN NaN NaN NaN NaN NaN NaN
N2 A1 B1 C1 D1 NaN NaN NaN NaN NaN NaN NaN NaN
N3 A2 B2 C2 D2 NaN NaN NaN NaN NaN NaN NaN NaN
N4 A3 B3 C3 D3 NaN NaN NaN NaN NaN NaN NaN NaN
N5 NaN NaN NaN NaN A6 B6 C6 D6 NaN NaN NaN NaN
N6 NaN NaN NaN NaN NaN NaN NaN NaN A9 B9 C9 D9
I'm assuming there must be an easier way to do this in pandas
and if anyone could share insight/suggestion, I'd greatly appreciate that. Thank you!
Upvotes: 0
Views: 84