user2186862
user2186862

Reputation: 223

pandas merge dataframes with multi-index

I have a bunch of multi-index pd.DataFrames containing statistics for weather observations taken at a set of ground stations on certain days. Here's the structure of one such data frame:

>>> df = pd.DataFrame({'month': [1, 1, 1, 1, 1, 1],
                   'day': [2]*6,
                   'station': ['A', 'B', 'C', 'D', 'E', 'F'],
                   'mean': [55, 40, 84, 31, 44, 12],
                   'sd': [1., 2., 1.2, 3., 4., 0.7]})
>>> df.set_index(['station', 'month', 'day'])
>>> df
>>> df = df.set_index(['station', 'month', 'day'])
>>> df
                   mean   sd
station month day           
A       1     2      55  1.0
B       1     2      40  2.0
C       1     2      84  1.2
D       1     2      31  3.0
E       1     2      44  4.0
F       1     2      12  0.7

df stores all the observations for day 2 of month 1 (say, Jan 2). The station index is unique for each dataframe (no two station IDs are the same). However, the set of stations may vary between individual dataframes, depending on the month and/or day.

Question: how do I combine these dataframes into a single dataframe with the following structure (ignore the exact mean and sd values, i've made those up):

>>> df
                   mean   sd
station month day           
A       1     1      55  1.0
              2      44  5.0
              3      34  1.2
(...)
        2     1      55  1.0
              2      44  5.0
              3      34  1.2
(...)   
B       1     1      31  3.0
              2      44  5.0
              3      34  1.2
(...)

Upvotes: 0

Views: 56

Answers (1)

Bruno Carballo
Bruno Carballo

Reputation: 1196

You can use pd.concat:

In [15]: df1 = pd.DataFrame({'month': [1, 1, 1, 1, 1, 1],
    ...:                    'day': [2]*6,
    ...:                    'station': ['A', 'B', 'C', 'D', 'E', 'F'],
    ...:                    'mean': [55, 40, 84, 31, 44, 12],
    ...:                    'sd': [1., 2., 1.2, 3., 4., 0.7]}).set_index(["station", "month", "day"])

In [16]: df2 = pd.DataFrame({'month': [2, 2, 2, 2, 2, 2],
    ...:                    'day': [2]*6,
    ...:                    'station': ['A', 'B', 'C', 'D', 'G', 'F'],
    ...:                    'mean': [55, 40, 84, 31, 15, 12],
    ...:                    'sd': [1., 2., 1.2, 3.,1, 0.7]}).set_index(["station", "month", "day"])

In [19]: pd.concat([df1,df2]).sort_index()
Out[19]:
                   mean   sd
station month day
A       1     2      55  1.0
        2     2      55  1.0
B       1     2      40  2.0
        2     2      40  2.0
C       1     2      84  1.2
        2     2      84  1.2
D       1     2      31  3.0
        2     2      31  3.0
E       1     2      44  4.0
F       1     2      12  0.7
        2     2      12  0.7
G       2     2      15  1.0

Upvotes: 2

Related Questions