Reputation: 4753
I've a pandas 1D DataFrame (columns: float, values: float) like this one:
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
0 1.0 2.0 5.0 4.0 3.0 NaN 1.0 7.0 NaN
I'd like to calculate the mean over specific row ranges. NaN
shall be considered as 0.0
. E.g. in case of constant column ranges relative to the overall column range (1.0 - 1.2
, 1.3 - 1.5
, 1.6 - 1.9
) I'd like to get the following DataFrame as result:
1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8
0 2.66 2.66 2.66 2.33 2.33 2.33 2.66 2.66 2.66
What's the most performant and memory aware implementation to achieve this?
Upvotes: 2
Views: 67
Reputation: 862441
If want mean
per each 3 values of columns, then use GroupBy.transform
with axis='columns'
and integer division of np.arange
by length of columns and also replace missing values to 0
before:
df = df.fillna(0).groupby(np.arange(len(df.columns)) // 3, axis='columns').transform('mean')
print (df)
1.0 1.1 1.2 1.3 1.4 1.5 1.6 \
0 2.666667 2.666667 2.666667 2.333333 2.333333 2.333333 2.666667
1.7 1.8
0 2.666667 2.666667
Detail:
print (np.arange(len(df.columns)))
[0 1 2 3 4 5 6 7 8]
print (np.arange(len(df.columns)) // 3)
[0 0 0 1 1 1 2 2 2]
Upvotes: 2