Reputation: 121
I have a dataframe where the data progresses in one hour intervals but one of the columns has a few NaN values. When I encounter a NaN, I would like the code to average the four hours prior to the NaN and then replace the NaN with that average.
I tired modifying the answer to this question: pandas DataFrame: replace nan values with average of columns but this example is taking an average of the whole column and not part of the column.
My Dataframe:
0 1 2 ... 6 7 8
3 12:53 2.778 ... -12.2 -16.7 69%
4 11:53 3.611 ... NaN -17.2 73%
5 10:53 2.778 ... -13.7 -17.2 73%
6 09:53 3.611 ... -13.3 -17.2 73%
7 08:53 2.778 ... -12.8 -16.7 76%
…
74 13:53 0.278 ... -15 -17.2 83%
Here is my modified attempt at calling the last 4 in the column as follows:
for i in df.index:
df[6].fillna(df[6].iloc[0:3].mean(), inplace=True)
This is returning the first 4 rows of column 6 which makes sense why I am getting that result but I do not know how to have it call the 4 rows after the NaN and average them.
Upvotes: 1
Views: 1017
Reputation: 121
Using the code provided by Joe Patten above worked great until I had a NaN in the top row of the column. To solve this I am checking if the first value is NaN, if it is, invert the dataframe, and run a rolling average from what was the bottom up and then reorient back to the correct postion. If the first value is not a NaN, run the rolling average code.
if df.isnull().loc[3,6] == 'True':
df['rollmean5'] = df[6].loc[::-1].rolling(5,center=False,min_periods=1).mean()
df['rollmean5'] = df['rollmean5'].loc[::-1]
df[6] = df[6].fillna(round(df['rollmean5'],2))
else:
df['rollmean5'] = df[6].rolling(5,center=False,min_periods=1).mean()
df[6] = df[6].fillna(round(df['rollmean5'],2))
Upvotes: 0
Reputation: 1704
You can use rolling
with window of n
to get the rolling average of the previous n
indices, and then fill your na with those values:
df = pd.DataFrame({'col1':[1,2,3,4,5,6,np.nan,8,9,10]})
df['rollmean5'] = df['col1'].rolling(5,center=False,min_periods=1).mean()
df['col1'] = df['col1'].fillna(df['rollmean5'])
Using a rolling window of 5 gets the average of the current index, and the previous 4.
Output:
col1 rollmean5
0 1.0 1.00
1 2.0 1.50
2 3.0 2.00
3 4.0 2.50
4 5.0 3.00
5 6.0 4.00
6 4.5 4.50
7 8.0 5.75
8 9.0 7.00
9 10.0 8.25
And obviously, you could drop the newly created rollmean5
when you are done.
Upvotes: 3
Reputation: 171
This should work? Not sure if there is something already built into pandas. Had to ignore the cases where there are not 4 preceding rows.
for i in df.index:
if i < 4:
pass
else:
df['6'].fillna(df.iloc[i-4:i,5].mean(), inplace=True)
Note it looks like your column '6' actually has index 5 which is why the iloc uses 5 for column.
Upvotes: 1