Reputation: 223
I have a bunch of multi-index pd.DataFrames
containing statistics for weather observations taken at a set of ground stations on certain days. Here's the structure of one such data frame:
>>> df = pd.DataFrame({'month': [1, 1, 1, 1, 1, 1],
'day': [2]*6,
'station': ['A', 'B', 'C', 'D', 'E', 'F'],
'mean': [55, 40, 84, 31, 44, 12],
'sd': [1., 2., 1.2, 3., 4., 0.7]})
>>> df.set_index(['station', 'month', 'day'])
>>> df
>>> df = df.set_index(['station', 'month', 'day'])
>>> df
mean sd
station month day
A 1 2 55 1.0
B 1 2 40 2.0
C 1 2 84 1.2
D 1 2 31 3.0
E 1 2 44 4.0
F 1 2 12 0.7
df
stores all the observations for day 2
of month 1
(say, Jan 2). The station index is unique for each dataframe (no two station IDs are the same). However, the set of stations may vary between individual dataframes, depending on the month and/or day.
Question: how do I combine these dataframes into a single dataframe with the following structure (ignore the exact mean
and sd
values, i've made those up):
>>> df
mean sd
station month day
A 1 1 55 1.0
2 44 5.0
3 34 1.2
(...)
2 1 55 1.0
2 44 5.0
3 34 1.2
(...)
B 1 1 31 3.0
2 44 5.0
3 34 1.2
(...)
Upvotes: 0
Views: 56
Reputation: 1196
You can use pd.concat:
In [15]: df1 = pd.DataFrame({'month': [1, 1, 1, 1, 1, 1],
...: 'day': [2]*6,
...: 'station': ['A', 'B', 'C', 'D', 'E', 'F'],
...: 'mean': [55, 40, 84, 31, 44, 12],
...: 'sd': [1., 2., 1.2, 3., 4., 0.7]}).set_index(["station", "month", "day"])
In [16]: df2 = pd.DataFrame({'month': [2, 2, 2, 2, 2, 2],
...: 'day': [2]*6,
...: 'station': ['A', 'B', 'C', 'D', 'G', 'F'],
...: 'mean': [55, 40, 84, 31, 15, 12],
...: 'sd': [1., 2., 1.2, 3.,1, 0.7]}).set_index(["station", "month", "day"])
In [19]: pd.concat([df1,df2]).sort_index()
Out[19]:
mean sd
station month day
A 1 2 55 1.0
2 2 55 1.0
B 1 2 40 2.0
2 2 40 2.0
C 1 2 84 1.2
2 2 84 1.2
D 1 2 31 3.0
2 2 31 3.0
E 1 2 44 4.0
F 1 2 12 0.7
2 2 12 0.7
G 2 2 15 1.0
Upvotes: 2