fill empty rows in columns with the average value of a certain interval in pandas

Question

I have a dataframe with empty cells, which I need to fill with the average of the previous values in a certain interval of scale. Part of the dataframe for the example:

code  scale     s1   s2  s3
1111   -8        4   17  21
1111   -7       14   46  33
1111   -6       31   47  23
1111   -5        8   41  44
1111   -4       17    8  20
1111   -3        9   53  28
1111   -2        5    3   9
1111   -1        7    2  32
1111    0        4   39  31
1111    1       NaN NaN NaN 
1111    2       NaN NaN NaN 
1111    3       NaN NaN NaN 
1111    4       NaN NaN NaN 
1111    5       NaN NaN NaN 
2222   -8       27   29  45
2222   -7       51   15  43
2222   -6       16    1  28
2222   -5        6   29  52
2222   -4       25   10   5
2222   -3        5   53  22
2222   -2        9   41   6
2222   -1       49    2  17
2222    0       21    9   7
2222    1       NaN NaN NaN 
2222    2       NaN NaN NaN 
2222    3       NaN NaN NaN 
2222    4       NaN NaN NaN 
2222    5       NaN NaN NaN 
3333   -8       54   17  17
3333   -7       32   34   5
3333   -6       15   48  23
3333   -5       32   15  45
3333   -4       35   10  41
3333   -3       15   29  50
3333   -2       34   43  29
3333   -1       48   54  14
3333    0       27   51  37
3333    1       NaN NaN NaN 
3333    2       NaN NaN NaN 
3333    3       NaN NaN NaN 
3333    4       NaN NaN NaN 
3333    5       NaN NaN NaN
....

Each interval where the scale is between 1 and 5 is empty. I need to find each such interval, and fill it with the mean of the column values corresponding to the scale between 0 and -4. For example, we find the first empty cell for code 1111 for column s1, where the average is found among the values 17, 9, 5, 7, 4 and fill in the backward empty cells for code 1111 and scale from 1 to 5 with this value. And so it is necessary to do for each column and for each empty interval. Each unique code corresponds to the values of the columns, where the scale is from -8 to 5. There can be many codes, and there may be several columns like s1, s2 and s3. But the scale is always from -8 to 5. I will give an example of the calculated values. I hope it will be more understandable.

code  scale     s1    s2    s3
1111   -8        4    17    21
1111   -7       14    46    33
1111   -6       31    47    23
1111   -5        8    41    44
1111   -4       17     8    20
1111   -3        9    53    28
1111   -2        5     3     9
1111   -1        7     2    32
1111    0        4    39    31
1111    1      8.4    21    24  
1111    2      8.4    21    24  
1111    3      8.4    21    24  
1111    4      8.4    21    24  
1111    5      8.4    21    24  
2222   -8       27    29    45
2222   -7       51    15    43
2222   -6       16     1    28
2222   -5        6    29    52
2222   -4       25    10     5
2222   -3        5    53    22
2222   -2        9    41     6
2222   -1       49     2    17
2222    0       21     9     7
2222    1     21.8    23  11.4  
2222    2     21.8    23  11.4
2222    3     21.8    23  11.4
2222    4     21.8    23  11.4
2222    5     21.8    23  11.4
3333   -8       54    17    17
3333   -7       32    34     5
3333   -6       15    48    23
3333   -5       32    15    45
3333   -4       35    10    41
3333   -3       15    29    50
3333   -2       34    43    29
3333   -1       48    54    14
3333    0       27    51    37
3333    1     31.8  37.4  34.2
3333    2     31.8  37.4  34.2
3333    3     31.8  37.4  34.2
3333    4     31.8  37.4  34.2  
3333    5     31.8  37.4  34.2
....

Any ideas how to do this?

IanS · Accepted Answer

I assume code is the index of the dataframe.

First, get the mean values:

df[df['scale'].between(-4, 0)].groupby(level=0).mean()

This returns:

      scale    s1    s2    s3
code                         
1111     -2  8.40 21.00 24.00
2222     -2 21.80 23.00 11.40
3333     -2 31.80 37.40 34.20

Second, assign the values with combine_first, which replaces only NaN values:

df.combine_first(df[df['scale'].between(-4, 0)].groupby(level=0).mean())

fill empty rows in columns with the average value of a certain interval in pandas

Answers (1)

Related Questions