pandas merge dataframes with multi-index

Question

I have a bunch of multi-index pd.DataFrames containing statistics for weather observations taken at a set of ground stations on certain days. Here's the structure of one such data frame:

>>> df = pd.DataFrame({'month': [1, 1, 1, 1, 1, 1],
                   'day': [2]*6,
                   'station': ['A', 'B', 'C', 'D', 'E', 'F'],
                   'mean': [55, 40, 84, 31, 44, 12],
                   'sd': [1., 2., 1.2, 3., 4., 0.7]})
>>> df.set_index(['station', 'month', 'day'])
>>> df
>>> df = df.set_index(['station', 'month', 'day'])
>>> df
                   mean   sd
station month day           
A       1     2      55  1.0
B       1     2      40  2.0
C       1     2      84  1.2
D       1     2      31  3.0
E       1     2      44  4.0
F       1     2      12  0.7

df stores all the observations for day 2 of month 1 (say, Jan 2). The station index is unique for each dataframe (no two station IDs are the same). However, the set of stations may vary between individual dataframes, depending on the month and/or day.

Question: how do I combine these dataframes into a single dataframe with the following structure (ignore the exact mean and sd values, i've made those up):

>>> df
                   mean   sd
station month day           
A       1     1      55  1.0
              2      44  5.0
              3      34  1.2
(...)
        2     1      55  1.0
              2      44  5.0
              3      34  1.2
(...)   
B       1     1      31  3.0
              2      44  5.0
              3      34  1.2
(...)

Bruno Carballo · Accepted Answer

You can use pd.concat:

In [15]: df1 = pd.DataFrame({'month': [1, 1, 1, 1, 1, 1],
    ...:                    'day': [2]*6,
    ...:                    'station': ['A', 'B', 'C', 'D', 'E', 'F'],
    ...:                    'mean': [55, 40, 84, 31, 44, 12],
    ...:                    'sd': [1., 2., 1.2, 3., 4., 0.7]}).set_index(["station", "month", "day"])

In [16]: df2 = pd.DataFrame({'month': [2, 2, 2, 2, 2, 2],
    ...:                    'day': [2]*6,
    ...:                    'station': ['A', 'B', 'C', 'D', 'G', 'F'],
    ...:                    'mean': [55, 40, 84, 31, 15, 12],
    ...:                    'sd': [1., 2., 1.2, 3.,1, 0.7]}).set_index(["station", "month", "day"])

In [19]: pd.concat([df1,df2]).sort_index()
Out[19]:
                   mean   sd
station month day
A       1     2      55  1.0
        2     2      55  1.0
B       1     2      40  2.0
        2     2      40  2.0
C       1     2      84  1.2
        2     2      84  1.2
D       1     2      31  3.0
        2     2      31  3.0
E       1     2      44  4.0
F       1     2      12  0.7
        2     2      12  0.7
G       2     2      15  1.0

pandas merge dataframes with multi-index

Answers (1)

Related Questions