user15649753
user15649753

Reputation:

how to find average of a couple of columns and make a new data set?

I have a data set how I can find the average of numerical columns for each 30 rows and make a smaller data set?

For example:

data

A    B     C
2.   4     f
3    1     d
4    2     p
3    1     l
2    0     g

Lets I wanna find the average of each 2 rows (for simplification).

A          B            C
(2+3)/2    (4+1)/2      d
(4+3)/2    (2+1)/2      l
2           0           g

Upvotes: 0

Views: 37

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35686

Try dividing the index by n to get groups then use groupby aggregate:

n = 2
new_df = df.groupby(df.index // n).agg({'A': 'mean', 'B': 'mean', 'C': 'last'})

new_df:

     A    B  C
0  2.5  2.5  d
1  3.5  1.5  l
2  2.0  0.0  g

Or the aggregation dict can be built programmatically by dtype here object dtypes are aggregated by taking the last value, all other types are aggregated with the mean:

import numpy as np

n = 2
aggregation_d = dict(
    zip(df.columns, np.where(df.dtypes != 'object', 'mean', 'last'))
)
new_df = df.groupby(df.index // n).agg(aggregation_d)

aggregation_d:

{'A': 'mean', 'B': 'mean', 'C': 'last'}

new_df:

     A    B  C
0  2.5  2.5  d
1  3.5  1.5  l
2  2.0  0.0  g

Explanation:

df.index // n

Creates a Series of values:

Int64Index([0, 0, 1, 1, 2], dtype='int64')

These values establish a group.

Then the dict of values specifies how to aggregate each group:

{'A': 'mean', 'B': 'mean', 'C': 'last'}

Take the mean of A and B, and the last value of C.

Upvotes: 1

Related Questions