user1330974
user1330974

Reputation: 2616

How to combine data frames of different sizes and overlapping indexes vertically and horizontally in pandas?

Suppose I have three dataframes as below:

df1 = pd.DataFrame({'2016-01': ['A0', 'A1', 'A2', 'A3'],
'2016-02': ['B0', 'B1', 'B2', 'B3'],
'2016-03': ['C0', 'C1', 'C2', 'C3'],
'2016-04': ['D0', 'D1', 'D2', 'D3']},
index=['N1', 'N2', 'N3', 'N4'])

df2 = pd.DataFrame({'2017-01': ['A4', 'A5', 'A6'],
'2017-02': ['B4', 'B5', 'B6'],
'2017-03': ['C4', 'C5', 'C6'],
'2017-04': ['D4', 'D5', 'D6']},
index=['N1', 'N3', 'N5'])

df3 = pd.DataFrame({'2018-01': ['A7', 'A8', 'A9'],
'2018-02': ['B7', 'B8', 'B9'],
'2018-03': ['C7', 'C8', 'C9'],
'2018-04': ['D7', 'D8', 'D9']},
index=['N1', 'N5', 'N6'])

>>> df1
   2016-01 2016-02 2016-03 2016-04
N1      A0      B0      C0      D0
N2      A1      B1      C1      D1
N3      A2      B2      C2      D2
N4      A3      B3      C3      D3
>>> df2
   2017-01 2017-02 2017-03 2017-04
N1      A4      B4      C4      D4
N3      A5      B5      C5      D5
N5      A6      B6      C6      D6
>>> df3
   2018-01 2018-02 2018-03 2018-04
N1      A7      B7      C7      D7
N5      A8      B8      C8      D8
N6      A9      B9      C9      D9

As you can see, df1, df2 and df3 have overlapping (but not exactly equal set of) indexes. How can I merge these df's into one final_df that looks like this:

df4 = pd.DataFrame({
'2016-01': ['A0', 'A1', 'A2', 'A3', '', ''],
'2016-02': ['B0', 'B1', 'B2', 'B3', '', ''],
'2016-03': ['C0', 'C1', 'C2', 'C3', '', ''],
'2016-04': ['D0', 'D1', 'D2', 'D3', '', ''],
'2017-01': ['A4', '',  'A5', '', 'A6', ''],
'2017-02': ['B4', '', 'B5', '', 'B6', ''],
'2017-03': ['C4', '', 'C5', '', 'C6', ''],
'2017-04': ['D4', '', 'D5', '', 'D6', ''],
'2018-01': ['A7', '', '', '', 'A8', 'A9'],
'2018-02': ['B7', '', '', '', 'B8', 'B9'],
'2018-03': ['C7', '', '', '', 'C8', 'C9'],
'2018-04': ['D7', '', '', '', 'D8', 'D9']},
index=['N1', 'N2', 'N3', 'N4', 'N5', 'N6'])

>>> df4
   2016-01 2016-02 2016-03 2016-04 2017-01 2017-02 2017-03 2017-04 2018-01 2018-02 2018-03 2018-04
N1      A0      B0      C0      D0      A4      B4      C4      D4      A7      B7      C7      D7
N2      A1      B1      C1      D1
N3      A2      B2      C2      D2      A5      B5      C5      D5
N4      A3      B3      C3      D3
N5                                      A6      B6      C6      D6      A8      B8      C8      D8
N6                                                                      A9      B9      C9      D9

I have tried something like this:

final_df = pd.DataFrame()
for cur_df in [df1, df2, df3]:
    # To handle the indexes that aren't overlapping between final_df and cur_df
    if not(set(cur_df.index).issubset(set(final_df.index))):
        final_df = pd.concat([final_df, cur_df[~cur_df.index.isin(final_df.index)]]) # this is working, but only 

but this is only taking care of the non-overlapping indexes and the output of final_df is like this:

>>> final_df
   2016-01 2016-02 2016-03 2016-04 2017-01 2017-02 2017-03 2017-04 2018-01 2018-02 2018-03 2018-04
N1      A0      B0      C0      D0     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
N2      A1      B1      C1      D1     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
N3      A2      B2      C2      D2     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
N4      A3      B3      C3      D3     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
N5     NaN     NaN     NaN     NaN      A6      B6      C6      D6     NaN     NaN     NaN     NaN
N6     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN      A9      B9      C9      D9

I'm assuming there must be an easier way to do this in pandas and if anyone could share insight/suggestion, I'd greatly appreciate that. Thank you!

Upvotes: 0

Views: 84

Answers (1)

BENY
BENY

Reputation: 323266

You can just using

pd.concat([df1,df2,df3],axis=1,sort=True)

Upvotes: 3

Related Questions