Kyle C
Kyle C

Reputation: 121

Averaging a few values in a dataframe to fill NaN values in the same column

I have a dataframe where the data progresses in one hour intervals but one of the columns has a few NaN values. When I encounter a NaN, I would like the code to average the four hours prior to the NaN and then replace the NaN with that average.

I tired modifying the answer to this question: pandas DataFrame: replace nan values with average of columns but this example is taking an average of the whole column and not part of the column.

My Dataframe:

0     1        2    ...     6      7     8  
3   12:53    2.778  ...   -12.2  -16.7  69%         
4   11:53    3.611  ...    NaN   -17.2  73%        
5   10:53    2.778  ...   -13.7  -17.2  73%       
6   09:53    3.611  ...   -13.3  -17.2  73%        
7   08:53    2.778  ...   -12.8  -16.7  76%        
… 
74  13:53    0.278  ...    -15   -17.2  83%

Here is my modified attempt at calling the last 4 in the column as follows:

for i in df.index:
    df[6].fillna(df[6].iloc[0:3].mean(), inplace=True)

This is returning the first 4 rows of column 6 which makes sense why I am getting that result but I do not know how to have it call the 4 rows after the NaN and average them.

Upvotes: 1

Views: 1017

Answers (3)

Kyle C
Kyle C

Reputation: 121

Using the code provided by Joe Patten above worked great until I had a NaN in the top row of the column. To solve this I am checking if the first value is NaN, if it is, invert the dataframe, and run a rolling average from what was the bottom up and then reorient back to the correct postion. If the first value is not a NaN, run the rolling average code.

if df.isnull().loc[3,6] == 'True':
    df['rollmean5'] = df[6].loc[::-1].rolling(5,center=False,min_periods=1).mean()
    df['rollmean5'] = df['rollmean5'].loc[::-1]
    df[6] = df[6].fillna(round(df['rollmean5'],2))
else:
    df['rollmean5'] = df[6].rolling(5,center=False,min_periods=1).mean()
    df[6] = df[6].fillna(round(df['rollmean5'],2))

Upvotes: 0

Joe Patten
Joe Patten

Reputation: 1704

You can use rolling with window of n to get the rolling average of the previous n indices, and then fill your na with those values:

df = pd.DataFrame({'col1':[1,2,3,4,5,6,np.nan,8,9,10]})
df['rollmean5'] = df['col1'].rolling(5,center=False,min_periods=1).mean()
df['col1'] = df['col1'].fillna(df['rollmean5'])

Using a rolling window of 5 gets the average of the current index, and the previous 4.

Output:

    col1    rollmean5
0   1.0     1.00
1   2.0     1.50
2   3.0     2.00
3   4.0     2.50
4   5.0     3.00
5   6.0     4.00
6   4.5     4.50
7   8.0     5.75
8   9.0     7.00
9   10.0    8.25

And obviously, you could drop the newly created rollmean5 when you are done.

Upvotes: 3

capohugo
capohugo

Reputation: 171

This should work? Not sure if there is something already built into pandas. Had to ignore the cases where there are not 4 preceding rows.

for i in df.index:
    if i < 4:
        pass
    else:
        df['6'].fillna(df.iloc[i-4:i,5].mean(), inplace=True)

Note it looks like your column '6' actually has index 5 which is why the iloc uses 5 for column.

Upvotes: 1

Related Questions