Pandas appending Series to DataFrame to write to a file

Question

I have list of Dataframes that I want to compute the mean on

~ pieces[1].head()

   Sample Label    C_RUNTIMEN  N_TQ  N_TR  ...   N_GEAR1  N_GEAR2  N_GEAR3  \
301       manual   82.150833     7    69  ...     3.615    1.952    1.241   
302       manual   82.150833     7    69  ...     3.615    1.952    1.241   
303       manual   82.150833     7    69  ...     3.615    1.952    1.241   
304       manual   82.150833     7    69  ...     3.615    1.952    1.241   
305       manual   82.150833     7    69  ...     3.615    1.952    1.241

, So i am looping through them ->

pieces = np.array_split(df,size)
output = pd.DataFrame()
for piece in pieces:
    dp = piece.mean()
    output = output.append(dp,ignore_index=True)

Unfortunately the output is sorted (the column names are alphabetical in the output) and I want to keep the original column order (as seen up top).

~ output.head()

  C_ABSHUM  C_ACCFUELGALN      C_AFR     C_AFRO  C_FRAIRWS  C_GEARRATIO  \
  0  44.578937      66.183858  14.466816  14.113321  18.831117     6.677792   
  1  34.042593      66.231229  14.320409  14.113321  22.368983     6.677792   
  2  34.497194      66.309320  14.210066  14.113321  25.353414     6.677792   
  3  43.430931      66.376632  14.314854  14.113321  28.462130     6.677792   
  4  44.419204      66.516515  14.314653  14.113321  32.244107     6.677792

I have tried variations of concat etc with no success. Is there a different way to think about this ?

paulsef11 · Accepted Answer

My recommendation would be to concat the list of dataframes using pd.concat. This will allow you to use the standard group-by/apply. In this example, multi_df is a MultiIndex which behaves like a standard data frame, only the indexing and group by is a little different:

x = []
for i in range(10):
    x.append(pd.DataFrame(dict(zip(list('abc'), [i + 1, i + 2, i + 3])), index = list('ind')))

Now x contains a list of data frames of the shape

And with

multi_df = pd.concat(x, keys = range(len(x)))
result = multi_df.groupby(level = [0]).apply(np.mean)

we get a data frame that looks like

    a   b   c
0   1   2   3
1   2   3   4
2   3   4   5
3   4   5   6
4   5   6   7
5   6   7   8
6   7   8   9
7   8   9  10
8   9  10  11
9  10  11  12

You can then just call result.to_csv('filepath') to write that out.

Pandas appending Series to DataFrame to write to a file

Answers (1)

Related Questions