thinwybk
thinwybk

Reputation: 4753

How can I calculate the mean over a series of specific, constant width column ranges?

I've a pandas 1D DataFrame (columns: float, values: float) like this one:

    1.0     1.1     1.2     1.3     1.4     1.5     1.6     1.7     1.8
0   1.0     2.0     5.0     4.0     3.0     NaN     1.0     7.0     NaN

I'd like to calculate the mean over specific row ranges. NaN shall be considered as 0.0. E.g. in case of constant column ranges relative to the overall column range (1.0 - 1.2, 1.3 - 1.5, 1.6 - 1.9) I'd like to get the following DataFrame as result:

    1.0     1.1     1.2     1.3     1.4     1.5     1.6     1.7     1.8
0   2.66    2.66    2.66    2.33    2.33    2.33    2.66    2.66    2.66

What's the most performant and memory aware implementation to achieve this?

Upvotes: 2

Views: 67

Answers (1)

jezrael
jezrael

Reputation: 862441

If want mean per each 3 values of columns, then use GroupBy.transform with axis='columns' and integer division of np.arange by length of columns and also replace missing values to 0 before:

df = df.fillna(0).groupby(np.arange(len(df.columns)) // 3, axis='columns').transform('mean')
print (df)
        1.0       1.1       1.2       1.3       1.4       1.5       1.6  \
0  2.666667  2.666667  2.666667  2.333333  2.333333  2.333333  2.666667   

        1.7       1.8  
0  2.666667  2.666667  

Detail:

print (np.arange(len(df.columns)))
[0 1 2 3 4 5 6 7 8]

print (np.arange(len(df.columns)) // 3)
[0 0 0 1 1 1 2 2 2]

Upvotes: 2

Related Questions