Reputation: 934
I have a 3 row x 96 column dataframe. I'm trying to computer the average of the two rows beneath the index (row1:96) for every 12 data points. here is my dataframe:
Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 \
0 1461274.92 1458079.44 1456807.1 1459216.08 1458643.24 1457145.19
1 478167.44 479528.72 480316.08 475569.52 472989.01 476054.89
2 ------ ------ ------ ------ ------ ------
Run 7 Run 8 Run 9 Run 10 ... Run 87 \
0 1458117.08 1455184.82 1455768.69 1454738.07 ... 1441822.45
1 473630.89 476282.93 475530.87 474200.22 ... 468525.2
2 ------ ------ ------ ------ ... ------
Run 88 Run 89 Run 90 Run 91 Run 92 Run 93 \
0 1445339.53 1461050.97 1446849.43 1438870.43 1431275.76 1430781.28
1 460076.8 473263.06 455885.07 475245.64 483875.35 487065.25
2 ------ ------ ------ ------ ------ ------
Run 94 Run 95 Run 96
0 1436007.32 1435238.23 1444300.51
1 474328.87 475789.12 458681.11
2 ------ ------ ------
[3 rows x 96 columns]
Currently I am trying to use df.irow(0) to select all the data in row index 0.
something along the lines of:
selection = np.arange(0,13)
for i in selection:
new_df = pd.DataFrame()
data = df.irow(0)
........
then i get lost
I just don't know how to link this range with the dataframe in order to computer the mean for every 12 data points in each column.
To summarize, I want the average for every 12 runs in each column. So, i should end up with a separate dataframe with 2 * 8 average values (96/12). any ideas?
thanks.
Upvotes: 0
Views: 3602
Reputation: 353059
You can do a groupby on axis=1 (using some dummy data I made up):
>>> h = df.iloc[:2].astype(float)
>>> h.groupby(np.arange(len(h.columns))//12, axis=1).mean()
0 1 2 3 4 5 6 7
0 0.609643 0.452047 0.536786 0.377845 0.544321 0.214615 0.541185 0.544462
1 0.382945 0.596034 0.659157 0.437576 0.490161 0.435382 0.476376 0.423039
First we extract the data and force recognition of a float (the presence of the ------
row means that you've probably got an object
dtype, which will make the mean
unhappy.)
Then we make an array saying what groups we want to put the different columns in:
>>> np.arange(len(df.columns))//12
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7,
7, 7, 7, 7], dtype=int32)
which we feed as an argument to groupby
. .mean()
handles the rest.
Upvotes: 2
Reputation: 37023
It's always best to try to use pandas
methods when you can, rather than iterating over the rows. The DataFrame's iloc
method is useful for extracting any number of rows.
The following example shows you how to do what you want in a two-column DataFrame. The same technique will work independent of the number of columns:
In [14]: df = pd.DataFrame({"x": [1, 2, "-"], "y": [3, 4, "-"]})
In [15]: df
Out[15]:
x y
0 1 3
1 2 4
2 - -
In [16]: df.iloc[2] = df.iloc[0:2].sum()
In [17]: df
Out[17]:
x y
0 1 3
1 2 4
2 3 7
However, in your case you want to sum each group of eight cells in df.iloc[2]`, so you might be better simply taking the result of the summing expression with the statement
ds = df.iloc[0:2].sum()
which with your data will have the form
col1 0
col2 1
col3 2
col4 3
...
col93 92
col94 93
col95 94
col96 95
(These numbers are representative, you will obviously see your column sums). You can then turn this into a 12x8 matrix with
ds.values.reshape(12, 8)
whose value is
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29, 30, 31],
[32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53, 54, 55],
[56, 57, 58, 59, 60, 61, 62, 63],
[64, 65, 66, 67, 68, 69, 70, 71],
[72, 73, 74, 75, 76, 77, 78, 79],
[80, 81, 82, 83, 84, 85, 86, 87],
[88, 89, 90, 91, 92, 93, 94, 95]])
but summing this array will give you the sum of all elements, so instead create another DataFrame with
rs = pd.DataFrame(ds.values.reshape(12, 8))
and then sum that:
rs.sum()
giving
0 528
1 540
2 552
3 564
4 576
5 588
6 600
7 612
dtype: int64
You may find in practice that it is easier to simply create two 12x8 matrices in the first place, which you can add together before creating a dataframe which you can then sum. Much depends on how you are reading your data.
Upvotes: 1