Reputation: 707
I have a dataframe with empty cells, which I need to fill with the average of the previous values in a certain interval of scale. Part of the dataframe for the example:
code scale s1 s2 s3
1111 -8 4 17 21
1111 -7 14 46 33
1111 -6 31 47 23
1111 -5 8 41 44
1111 -4 17 8 20
1111 -3 9 53 28
1111 -2 5 3 9
1111 -1 7 2 32
1111 0 4 39 31
1111 1 NaN NaN NaN
1111 2 NaN NaN NaN
1111 3 NaN NaN NaN
1111 4 NaN NaN NaN
1111 5 NaN NaN NaN
2222 -8 27 29 45
2222 -7 51 15 43
2222 -6 16 1 28
2222 -5 6 29 52
2222 -4 25 10 5
2222 -3 5 53 22
2222 -2 9 41 6
2222 -1 49 2 17
2222 0 21 9 7
2222 1 NaN NaN NaN
2222 2 NaN NaN NaN
2222 3 NaN NaN NaN
2222 4 NaN NaN NaN
2222 5 NaN NaN NaN
3333 -8 54 17 17
3333 -7 32 34 5
3333 -6 15 48 23
3333 -5 32 15 45
3333 -4 35 10 41
3333 -3 15 29 50
3333 -2 34 43 29
3333 -1 48 54 14
3333 0 27 51 37
3333 1 NaN NaN NaN
3333 2 NaN NaN NaN
3333 3 NaN NaN NaN
3333 4 NaN NaN NaN
3333 5 NaN NaN NaN
....
Each interval where the scale
is between 1
and 5
is empty. I need to find each such interval, and fill it with the mean of the column values corresponding to the scale
between 0
and -4
. For example, we find the first empty cell for code
1111
for column s1
, where the average is found among the values 17, 9, 5, 7, 4
and fill in the backward empty cells for code
1111
and scale
from 1
to 5
with this value. And so it is necessary to do for each column and for each empty interval. Each unique code
corresponds to the values of the columns, where the scale
is from -8
to 5
. There can be many codes, and there may be several columns like s1
, s2
and s3
. But the scale
is always from -8
to 5
. I will give an example of the calculated values. I hope it will be more understandable.
code scale s1 s2 s3
1111 -8 4 17 21
1111 -7 14 46 33
1111 -6 31 47 23
1111 -5 8 41 44
1111 -4 17 8 20
1111 -3 9 53 28
1111 -2 5 3 9
1111 -1 7 2 32
1111 0 4 39 31
1111 1 8.4 21 24
1111 2 8.4 21 24
1111 3 8.4 21 24
1111 4 8.4 21 24
1111 5 8.4 21 24
2222 -8 27 29 45
2222 -7 51 15 43
2222 -6 16 1 28
2222 -5 6 29 52
2222 -4 25 10 5
2222 -3 5 53 22
2222 -2 9 41 6
2222 -1 49 2 17
2222 0 21 9 7
2222 1 21.8 23 11.4
2222 2 21.8 23 11.4
2222 3 21.8 23 11.4
2222 4 21.8 23 11.4
2222 5 21.8 23 11.4
3333 -8 54 17 17
3333 -7 32 34 5
3333 -6 15 48 23
3333 -5 32 15 45
3333 -4 35 10 41
3333 -3 15 29 50
3333 -2 34 43 29
3333 -1 48 54 14
3333 0 27 51 37
3333 1 31.8 37.4 34.2
3333 2 31.8 37.4 34.2
3333 3 31.8 37.4 34.2
3333 4 31.8 37.4 34.2
3333 5 31.8 37.4 34.2
....
Any ideas how to do this?
Upvotes: 2
Views: 1076
Reputation: 16261
I assume code
is the index of the dataframe.
First, get the mean values:
df[df['scale'].between(-4, 0)].groupby(level=0).mean()
This returns:
scale s1 s2 s3
code
1111 -2 8.40 21.00 24.00
2222 -2 21.80 23.00 11.40
3333 -2 31.80 37.40 34.20
Second, assign the values with combine_first
, which replaces only NaN
values:
df.combine_first(df[df['scale'].between(-4, 0)].groupby(level=0).mean())
Upvotes: 3