loop for computing average of selected data in dataframe using pandas

Question

I have a 3 row x 96 column dataframe. I'm trying to computer the average of the two rows beneath the index (row1:96) for every 12 data points. here is my dataframe:

 Run 1       Run 2      Run 3       Run 4       Run 5       Run 6  \
0  1461274.92  1458079.44  1456807.1  1459216.08  1458643.24  1457145.19   
1   478167.44   479528.72  480316.08   475569.52   472989.01   476054.89   
2      ------      ------     ------      ------      ------      ------   

    Run 7       Run 8       Run 9      Run 10     ...          Run 87  \
0  1458117.08  1455184.82  1455768.69  1454738.07     ...      1441822.45   
1   473630.89   476282.93   475530.87   474200.22     ...        468525.2   
2      ------      ------      ------      ------     ...          ------   

   Run 88      Run 89      Run 90      Run 91      Run 92      Run 93  \
0  1445339.53  1461050.97  1446849.43  1438870.43  1431275.76  1430781.28   
1    460076.8   473263.06   455885.07   475245.64   483875.35   487065.25   
2      ------      ------      ------      ------      ------      ------   

   Run 94      Run 95      Run 96  
0  1436007.32  1435238.23  1444300.51  
1   474328.87   475789.12   458681.11  
2      ------      ------      ------  

[3 rows x 96 columns]

Currently I am trying to use df.irow(0) to select all the data in row index 0.

something along the lines of:

selection = np.arange(0,13)

for i in selection:
    new_df = pd.DataFrame()
    data = df.irow(0)

    ........

then i get lost

I just don't know how to link this range with the dataframe in order to computer the mean for every 12 data points in each column.

To summarize, I want the average for every 12 runs in each column. So, i should end up with a separate dataframe with 2 * 8 average values (96/12). any ideas?

thanks.

DSM · Accepted Answer

You can do a groupby on axis=1 (using some dummy data I made up):

>>> h = df.iloc[:2].astype(float)
>>> h.groupby(np.arange(len(h.columns))//12, axis=1).mean()
          0         1         2         3         4         5         6         7
0  0.609643  0.452047  0.536786  0.377845  0.544321  0.214615  0.541185  0.544462
1  0.382945  0.596034  0.659157  0.437576  0.490161  0.435382  0.476376  0.423039

First we extract the data and force recognition of a float (the presence of the ------ row means that you've probably got an object dtype, which will make the mean unhappy.)

Then we make an array saying what groups we want to put the different columns in:

>>> np.arange(len(df.columns))//12
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5,
       5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7,
       7, 7, 7, 7], dtype=int32)

which we feed as an argument to groupby. .mean() handles the rest.

loop for computing average of selected data in dataframe using pandas

Answers (2)

Related Questions