Reputation: 707
I have the following dataframe:
df1
index year week a b c
-10 2017 10 45 26 19
-9 2017 11 37 23 14
-8 2017 12 21 66 19
-7 2017 13 47 36 92
-6 2017 14 82 65 18
-5 2017 15 68 68 19
-4 2017 16 30 95 24
-3 2017 17 21 15 94
-2 2017 18 67 30 16
-1 2017 19 10 13 13
0 2017 20 26 22 18
1 2017 21 NaN NaN NaN
2 2017 22 NaN NaN NaN
3 2017 23 NaN NaN NaN
4 2017 24 NaN NaN NaN
...
53 2018 20 NaN NaN NaN
I need for each empty cell to calculate the average of the previous nth values in a column and write this value into a cell. n
is equal to the number of indexes from zero and up. For example, for the first empty cell in the column a
I must calculate the average between the indexes 0
and -10
. Then for the next cell between 1
and -9
and so on. And so do for columns a
, b
and c
. And calculations always begin where the index = 1
.
And the problem is that the number of columns such as a
, b
, c
can be different. But I know that these columns will always be after the column week
. Is it possible to apply these calculations to an indefinite number of columns, but if it is known that these columns will be located after the column week
?
I tried hard to find anything, but I could not find anything suitable.
UPD: If this helps, the maximum number of rows from index = 0
and down will be 53
.
Upvotes: 4
Views: 1371
Reputation: 2424
This can be done like follows:
n = 11 # in the example of your explanation
df = df1.loc[range(1,df1.index[-1]+1)] # select rows from index 1 above
df
should look like:
year week a b c
index
1 2017 21 NaN NaN NaN
2 2017 22 NaN NaN NaN
3 2017 23 NaN NaN NaN
4 2017 24 NaN NaN NaN
then you:
for s in list(df.index): # iterate through rows with nan values
for i in range(2,df.columns.size): # iterate through different cols ('a','b','c' or more)
df1.loc[s,df.columns[i]] = df1.loc[range(s-n,s),df.columns[i]].sum()/n
print(df1)
please notice that in this case i followed your example and supposed that year
will be always first column and week
always second in order to select all columns after week
and index
is.. well the index
output:
year week a b c
index
-10 2017 10 45.000000 26.000000 19.000000
-9 2017 11 37.000000 23.000000 14.000000
-8 2017 12 21.000000 66.000000 19.000000
-7 2017 13 47.000000 36.000000 92.000000
-6 2017 14 82.000000 65.000000 18.000000
-5 2017 15 68.000000 68.000000 19.000000
-4 2017 16 30.000000 95.000000 24.000000
-3 2017 17 21.000000 15.000000 94.000000
-2 2017 18 67.000000 30.000000 16.000000
-1 2017 19 10.000000 13.000000 13.000000
0 2017 20 26.000000 22.000000 18.000000
1 2017 21 41.272727 41.727273 31.454545
2 2017 22 40.933884 43.157025 32.586777
3 2017 23 41.291510 44.989482 34.276484
4 2017 24 43.136193 43.079434 35.665255
Upvotes: 2
Reputation: 1000
You can do something like this by playing around a bit with pandas and numpy. Assuming that you know what the index of the week
column will be(even if you don't, a simple search will get you the index), like example, the week
column is 3rd you can do something like
import numpy as np
import pandas as pd
#data is your dataframe name
column_list = list(data.columns.values)[3:]
for column_name in column_list :
column = data[column_name].values
#converted pandas series to numpy series
for index in xrange(0,column.shape[0]):
#iterating over entries in the column
if np.isnan(column[index]):
column[index] = np.nanmean(column.take(range(index-10,index+1),mode='wrap'))
This is a bad unvectorized solution, but should work fine. It will replace all NaN entries with the previous 10 entries wrapped around. If you instead wanted only the previous 10 without a wrap around, you simply take the first n for n lesser than 10, like
new_df[index] = np.nanmean(new_df[max(0,index-10):index+1])
Hope this helps!
Upvotes: 1