Reputation:
I have to replace the NaN value of my dataset which is organised like this
SUBDIVISION: ANDAMAN & NICOBAR ISLANDS
YEAR: 1901
JAN: 49.20
FEB: 87.10
MAR: NaN
APR: 19.10
MAY: 89.99
(In reality there are all months of the year and the data are displayed in column)
I have to replace the NaN value with the mean of the previous (1900) and the next year (1902), and if it is possible adding to this operation also the values of the next month of the same year (APR 1901) doing the mean of this three values.
import pandas as pd import numpy as np import statistics # Loop through rows of dataframe by index i.e. from 0 to number of rows for i in range(0, df.shape[0]): for j in range(1, df.shape[1]): #iterate over columns if pd.isna(df.iloc[i,j]): adjacentYearBefore = df.iloc[i-1,j].mean() adjacentYearAfter= df.iloc[i+1,j].mean() #avgYear = ((np.float64(adjacentYearBefore)) + (np.float64(adjacentYearAfter))) / 2.0 avgYear = (adjacentYearBefore + adjacentYearAfter).mean() print(avgYear) df.iloc[i,j] = df.iloc[i,j].fillna(avgYear)
But return me this
AttributeError Traceback (most recent call last)
<ipython-input-29-32c064fa8ca4> in <module>()
11 avgYear = (adjacentYearBefore + adjacentYearAfter).mean()
12 print(avgYear)
---> 13 df.iloc[i,j] = df.iloc[i,j].fillna(avgYear)
14 #df.iloc[i,j].ffill(inplace=True)
15 #
AttributeError: 'numpy.float64' object has no attribute 'fillna'
I've read of course the similar question on stack overflow, of course, but i still could not solve my problem.
UPDATE:
df.head(10)
Upvotes: 2
Views: 1907
Reputation: 3583
My dataframe looks like this:
annual
1901 3373.2
1902 3520.7
1903 2957.4
1904 3079.6
1905 2566.7
1906 2534.4
1907 NaN
1908 3576.4
1909 2899.4
1910 2687.2
How do I fill in the nan with the average of the year before and after?
df.interpolate()
gives
annual
1901 3373.2
1902 3520.7
1903 2957.4
1904 3079.6
1905 2566.7
1906 2534.4
1907 3055.4
1908 3576.4
1909 2899.4
1910 2687.2
Have I guessed correctly what you're asking and are you happy with the solution? Notice btw. that this uses the default interpolation method that is method=linear
.
There are few things to notice about your code that maybe you can learn from. The .mean
in
adjacentYearBefore = df.iloc[i-1,j].mean()
adjacentYearAfter = df.iloc[i+1,j].mean()
don't do anything since you're taking the mean of one value.
Same is true for
avgYear = (adjacentYearBefore + adjacentYearAfter).mean()
Notice that you're first adding the two values and then taking the mean of that one value so you didn't divide by two.
And finally in
df.iloc[i,j] = df.iloc[i,j].fillna(avgYear)
you are taking one value and call fillna
on it. That is not possible since fillna
is supposed to be called on a series
or dataframe
filling in all the values. Here you could just assign the value.
for i in range(0, df.shape[0]):
for j in range(1, df.shape[1]): #iterate over columns
if pd.isna(df.iloc[i,j]):
adjacentYearBefore = df.iloc[i-1,j]
adjacentYearAfter= df.iloc[i+1,j]
avgYear = (adjacentYearBefore + adjacentYearAfter)/2
print(avgYear)
df.iloc[i,j] = avgYear
Upvotes: 4