Reputation: 657
I have list of Dataframes that I want to compute the mean on
~ pieces[1].head()
Sample Label C_RUNTIMEN N_TQ N_TR ... N_GEAR1 N_GEAR2 N_GEAR3 \
301 manual 82.150833 7 69 ... 3.615 1.952 1.241
302 manual 82.150833 7 69 ... 3.615 1.952 1.241
303 manual 82.150833 7 69 ... 3.615 1.952 1.241
304 manual 82.150833 7 69 ... 3.615 1.952 1.241
305 manual 82.150833 7 69 ... 3.615 1.952 1.241
, So i am looping through them ->
pieces = np.array_split(df,size)
output = pd.DataFrame()
for piece in pieces:
dp = piece.mean()
output = output.append(dp,ignore_index=True)
Unfortunately the output is sorted (the column names are alphabetical in the output) and I want to keep the original column order (as seen up top).
~ output.head()
C_ABSHUM C_ACCFUELGALN C_AFR C_AFRO C_FRAIRWS C_GEARRATIO \
0 44.578937 66.183858 14.466816 14.113321 18.831117 6.677792
1 34.042593 66.231229 14.320409 14.113321 22.368983 6.677792
2 34.497194 66.309320 14.210066 14.113321 25.353414 6.677792
3 43.430931 66.376632 14.314854 14.113321 28.462130 6.677792
4 44.419204 66.516515 14.314653 14.113321 32.244107 6.677792
I have tried variations of concat etc with no success. Is there a different way to think about this ?
Upvotes: 0
Views: 66
Reputation: 648
My recommendation would be to concat the list of dataframes using pd.concat. This will allow you to use the standard group-by/apply. In this example, multi_df is a MultiIndex which behaves like a standard data frame, only the indexing and group by is a little different:
x = []
for i in range(10):
x.append(pd.DataFrame(dict(zip(list('abc'), [i + 1, i + 2, i + 3])), index = list('ind')))
Now x contains a list of data frames of the shape
a b c
i 2 3 4
n 2 3 4
d 2 3 4
And with
multi_df = pd.concat(x, keys = range(len(x)))
result = multi_df.groupby(level = [0]).apply(np.mean)
we get a data frame that looks like
a b c
0 1 2 3
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6 7 8 9
7 8 9 10
8 9 10 11
9 10 11 12
You can then just call result.to_csv('filepath') to write that out.
Upvotes: 1