yanadm
yanadm

Reputation: 707

fill empty rows in columns with the average value of a certain interval in pandas

I have a dataframe with empty cells, which I need to fill with the average of the previous values in a certain interval of scale. Part of the dataframe for the example:

code  scale     s1   s2  s3
1111   -8        4   17  21
1111   -7       14   46  33
1111   -6       31   47  23
1111   -5        8   41  44
1111   -4       17    8  20
1111   -3        9   53  28
1111   -2        5    3   9
1111   -1        7    2  32
1111    0        4   39  31
1111    1       NaN NaN NaN 
1111    2       NaN NaN NaN 
1111    3       NaN NaN NaN 
1111    4       NaN NaN NaN 
1111    5       NaN NaN NaN 
2222   -8       27   29  45
2222   -7       51   15  43
2222   -6       16    1  28
2222   -5        6   29  52
2222   -4       25   10   5
2222   -3        5   53  22
2222   -2        9   41   6
2222   -1       49    2  17
2222    0       21    9   7
2222    1       NaN NaN NaN 
2222    2       NaN NaN NaN 
2222    3       NaN NaN NaN 
2222    4       NaN NaN NaN 
2222    5       NaN NaN NaN 
3333   -8       54   17  17
3333   -7       32   34   5
3333   -6       15   48  23
3333   -5       32   15  45
3333   -4       35   10  41
3333   -3       15   29  50
3333   -2       34   43  29
3333   -1       48   54  14
3333    0       27   51  37
3333    1       NaN NaN NaN 
3333    2       NaN NaN NaN 
3333    3       NaN NaN NaN 
3333    4       NaN NaN NaN 
3333    5       NaN NaN NaN
....            

Each interval where the scale is between 1 and 5 is empty. I need to find each such interval, and fill it with the mean of the column values ​​corresponding to the scale between 0 and -4. For example, we find the first empty cell for code 1111 for column s1, where the average is found among the values ​​17, 9, 5, 7, 4 and fill in the backward empty cells for code 1111 and scale from 1 to 5 with this value. And so it is necessary to do for each column and for each empty interval. Each unique code corresponds to the values ​​of the columns, where the scale is from -8 to 5. There can be many codes, and there may be several columns like s1, s2 and s3. But the scale is always from -8 to 5. I will give an example of the calculated values. I hope it will be more understandable.

code  scale     s1    s2    s3
1111   -8        4    17    21
1111   -7       14    46    33
1111   -6       31    47    23
1111   -5        8    41    44
1111   -4       17     8    20
1111   -3        9    53    28
1111   -2        5     3     9
1111   -1        7     2    32
1111    0        4    39    31
1111    1      8.4    21    24  
1111    2      8.4    21    24  
1111    3      8.4    21    24  
1111    4      8.4    21    24  
1111    5      8.4    21    24  
2222   -8       27    29    45
2222   -7       51    15    43
2222   -6       16     1    28
2222   -5        6    29    52
2222   -4       25    10     5
2222   -3        5    53    22
2222   -2        9    41     6
2222   -1       49     2    17
2222    0       21     9     7
2222    1     21.8    23  11.4  
2222    2     21.8    23  11.4
2222    3     21.8    23  11.4
2222    4     21.8    23  11.4
2222    5     21.8    23  11.4
3333   -8       54    17    17
3333   -7       32    34     5
3333   -6       15    48    23
3333   -5       32    15    45
3333   -4       35    10    41
3333   -3       15    29    50
3333   -2       34    43    29
3333   -1       48    54    14
3333    0       27    51    37
3333    1     31.8  37.4  34.2
3333    2     31.8  37.4  34.2
3333    3     31.8  37.4  34.2
3333    4     31.8  37.4  34.2  
3333    5     31.8  37.4  34.2
....            

Any ideas how to do this?

Upvotes: 2

Views: 1076

Answers (1)

IanS
IanS

Reputation: 16261

I assume code is the index of the dataframe.

First, get the mean values:

df[df['scale'].between(-4, 0)].groupby(level=0).mean()

This returns:

      scale    s1    s2    s3
code                         
1111     -2  8.40 21.00 24.00
2222     -2 21.80 23.00 11.40
3333     -2 31.80 37.40 34.20

Second, assign the values with combine_first, which replaces only NaN values:

df.combine_first(df[df['scale'].between(-4, 0)].groupby(level=0).mean())

Upvotes: 3

Related Questions