Calculate the average of the n-th number of elements in the column in pandas

Question

I have the following dataframe:

             df1
index   year   week   a     b     c
 -10    2017    10   45    26    19
  -9    2017    11   37    23    14
  -8    2017    12   21    66    19
  -7    2017    13   47    36    92
  -6    2017    14   82    65    18
  -5    2017    15   68    68    19
  -4    2017    16   30    95    24
  -3    2017    17   21    15    94
  -2    2017    18   67    30    16
  -1    2017    19   10    13    13
   0    2017    20   26    22    18
   1    2017    21   NaN   NaN   NaN
   2    2017    22   NaN   NaN   NaN
   3    2017    23   NaN   NaN   NaN
   4    2017    24   NaN   NaN   NaN
   ...
   53   2018    20   NaN   NaN   NaN

I need for each empty cell to calculate the average of the previous nth values in a column and write this value into a cell. n is equal to the number of indexes from zero and up. For example, for the first empty cell in the column a I must calculate the average between the indexes 0 and -10. Then for the next cell between 1 and -9 and so on. And so do for columns a, b and c. And calculations always begin where the index = 1.

And the problem is that the number of columns such as a, b, c can be different. But I know that these columns will always be after the column week. Is it possible to apply these calculations to an indefinite number of columns, but if it is known that these columns will be located after the column week?

I tried hard to find anything, but I could not find anything suitable.

UPD: If this helps, the maximum number of rows from index = 0 and down will be 53.

Rudresh Panchal · Accepted Answer

You can do something like this by playing around a bit with pandas and numpy. Assuming that you know what the index of the week column will be(even if you don't, a simple search will get you the index), like example, the week column is 3rd you can do something like

import numpy as np
import pandas as pd
#data is your dataframe name
column_list = list(data.columns.values)[3:]
for column_name in column_list :
    column = data[column_name].values
    #converted pandas series to numpy series
    for index in xrange(0,column.shape[0]):
        #iterating over entries in the column
        if np.isnan(column[index]):
            column[index] = np.nanmean(column.take(range(index-10,index+1),mode='wrap'))

This is a bad unvectorized solution, but should work fine. It will replace all NaN entries with the previous 10 entries wrapped around. If you instead wanted only the previous 10 without a wrap around, you simply take the first n for n lesser than 10, like
new_df[index] = np.nanmean(new_df[max(0,index-10):index+1])

Hope this helps!

Calculate the average of the n-th number of elements in the column in pandas

Answers (2)

Related Questions