How can I calculate the mean over a series of specific, constant width column ranges?

Question

I've a pandas 1D DataFrame (columns: float, values: float) like this one:

    1.0     1.1     1.2     1.3     1.4     1.5     1.6     1.7     1.8
0   1.0     2.0     5.0     4.0     3.0     NaN     1.0     7.0     NaN

I'd like to calculate the mean over specific row ranges. NaN shall be considered as 0.0. E.g. in case of constant column ranges relative to the overall column range (1.0 - 1.2, 1.3 - 1.5, 1.6 - 1.9) I'd like to get the following DataFrame as result:

    1.0     1.1     1.2     1.3     1.4     1.5     1.6     1.7     1.8
0   2.66    2.66    2.66    2.33    2.33    2.33    2.66    2.66    2.66

What's the most performant and memory aware implementation to achieve this?

jezrael · Accepted Answer

If want mean per each 3 values of columns, then use GroupBy.transform with axis='columns' and integer division of np.arange by length of columns and also replace missing values to 0 before:

df = df.fillna(0).groupby(np.arange(len(df.columns)) // 3, axis='columns').transform('mean')
print (df)
        1.0       1.1       1.2       1.3       1.4       1.5       1.6  \
0  2.666667  2.666667  2.666667  2.333333  2.333333  2.333333  2.666667   

        1.7       1.8  
0  2.666667  2.666667

Detail:

print (np.arange(len(df.columns)))
[0 1 2 3 4 5 6 7 8]

print (np.arange(len(df.columns)) // 3)
[0 0 0 1 1 1 2 2 2]

How can I calculate the mean over a series of specific, constant width column ranges?

Answers (1)

Related Questions