Distribute value equally across NaN in pandas

Question

I have the following dataframe:

                     var_value
2016-07-01 05:10:00      809.0
2016-07-01 05:15:00        NaN
2016-07-01 05:20:00        NaN
2016-07-01 05:25:00        NaN
2016-07-01 05:30:00        NaN
2016-07-01 05:35:00        NaN
2016-07-01 05:40:00        NaN
2016-07-01 05:45:00        NaN
2016-07-01 05:50:00        NaN
2016-07-01 05:55:00        NaN
2016-07-01 06:00:00        NaN
2016-07-01 06:05:00        NaN
2016-07-01 06:10:00      185.0
2016-07-01 06:15:00        NaN
2016-07-01 06:20:00        NaN
2016-07-01 06:25:00        NaN
2016-07-01 06:30:00        NaN
2016-07-01 06:35:00        NaN
2016-07-01 06:40:00        NaN
2016-07-01 06:45:00        NaN
2016-07-01 06:50:00        NaN
2016-07-01 06:55:00        NaN
2016-07-01 07:00:00        NaN
2016-07-01 07:05:00        NaN

I want to distribute the 809.0 and 185.0 evenly across the rows. So my resulting dataframe should look like:

               var_value
7/1/2016 5:10    67.42 
7/1/2016 5:15    67.42 
7/1/2016 5:20    67.42 
7/1/2016 5:25    67.42 
7/1/2016 5:30    67.42 
7/1/2016 5:35    67.42 
7/1/2016 5:40    67.42 
7/1/2016 5:45    67.42 
7/1/2016 5:50    67.42 
7/1/2016 5:55    67.42 
7/1/2016 6:00    67.42 
7/1/2016 6:05    67.42 
7/1/2016 6:10    15.42 
7/1/2016 6:15    15.42 
7/1/2016 6:20    15.42 
7/1/2016 6:25    15.42 
7/1/2016 6:30    15.42 
7/1/2016 6:35    15.42 
7/1/2016 6:40    15.42 
7/1/2016 6:45    15.42 
7/1/2016 6:50    15.42 
7/1/2016 6:55    15.42 
7/1/2016 7:00    15.42 
7/1/2016 7:05    15.42

The number of rows between the known values that need to be distributed (so the NaNs in this case) can differ. In this case it is nicely 11 unknowns, but it could be 10 or 3 or 7, etc.

Any help on solving this would be very much appreciated.

jezrael · Accepted Answer

You can first ffill NaN values and then divide by len with GroupBy.transform:

df['var_value'] = df.var_value.ffill()
df['var_value'] = df['var_value'] / df.groupby('var_value')['var_value'].transform(len)

print (df)
                     var_value
2016-07-01 05:10:00  67.416667
2016-07-01 05:15:00  67.416667
2016-07-01 05:20:00  67.416667
2016-07-01 05:25:00  67.416667
2016-07-01 05:30:00  67.416667
2016-07-01 05:35:00  67.416667
2016-07-01 05:40:00  67.416667
2016-07-01 05:45:00  67.416667
2016-07-01 05:50:00  67.416667
2016-07-01 05:55:00  67.416667
2016-07-01 06:00:00  67.416667
2016-07-01 06:05:00  67.416667
2016-07-01 06:10:00  15.416667
2016-07-01 06:15:00  15.416667
2016-07-01 06:20:00  15.416667
2016-07-01 06:25:00  15.416667
2016-07-01 06:30:00  15.416667
2016-07-01 06:35:00  15.416667
2016-07-01 06:40:00  15.416667
2016-07-01 06:45:00  15.416667
2016-07-01 06:50:00  15.416667
2016-07-01 06:55:00  15.416667
2016-07-01 07:00:00  15.416667
2016-07-01 07:05:00  15.416667

Comparing solutions:

len(df)=24:

In [18]: %timeit (jez(df))
1000 loops, best of 3: 1.18 ms per loop

In [19]: %timeit (pir(df1))
100 loops, best of 3: 2.92 ms per loop

len(df)=24k:

In [21]: %timeit (jez(df))
100 loops, best of 3: 7.49 ms per loop

In [22]: %timeit (pir(df1))
1 loop, best of 3: 590 ms per loop

Code for timings:

#if need comapre 24k
#df = pd.concat([df]*1000).reset_index(drop=True)
df1 = df.copy()
def jez(df):
    df['var_value'] = df.var_value.ffill()
    df['var_value'] = df['var_value'] / df.groupby('var_value')['var_value'].transform(len)
    return df    

def pir(df):
    df = df.fillna(0).groupby(df.var_value.notnull().cumsum()).transform(lambda x: x.mean())
    return df    


print (jez(df))
print (pir(df1))

Distribute value equally across NaN in pandas

Answers (2)

Explanation

`df.notnull.cumsum()`

Related Questions