user13140381
user13140381

Reputation:

Fill NaN value with the mean of the previous and the next row - Python

I have to replace the NaN value of my dataset which is organised like this

SUBDIVISION: ANDAMAN & NICOBAR ISLANDS
YEAR: 1901 JAN: 49.20 FEB: 87.10 MAR: NaN APR: 19.10 MAY: 89.99

(In reality there are all months of the year and the data are displayed in column)

I have to replace the NaN value with the mean of the previous (1900) and the next year (1902), and if it is possible adding to this operation also the values of the next month of the same year (APR 1901) doing the mean of this three values.

import pandas as pd
import numpy as np
import statistics
# Loop through rows of dataframe by index i.e. from 0 to number of rows
for i in range(0, df.shape[0]):
  for j in range(1, df.shape[1]):  #iterate over columns
   if pd.isna(df.iloc[i,j]):
     adjacentYearBefore = df.iloc[i-1,j].mean()
     adjacentYearAfter= df.iloc[i+1,j].mean()
     #avgYear = ((np.float64(adjacentYearBefore)) + (np.float64(adjacentYearAfter))) / 2.0
     avgYear = (adjacentYearBefore + adjacentYearAfter).mean()
     print(avgYear)
     df.iloc[i,j] = df.iloc[i,j].fillna(avgYear)

But return me this

AttributeError                            Traceback (most recent call last)
<ipython-input-29-32c064fa8ca4> in <module>()
     11      avgYear = (adjacentYearBefore + adjacentYearAfter).mean()
     12      print(avgYear)
---> 13      df.iloc[i,j] = df.iloc[i,j].fillna(avgYear)
     14      #df.iloc[i,j].ffill(inplace=True)
     15      #

AttributeError: 'numpy.float64' object has no attribute 'fillna'

I've read of course the similar question on stack overflow, of course, but i still could not solve my problem.

UPDATE:

df.head(10)

output of df.head(10)

Upvotes: 2

Views: 1907

Answers (1)

Lukas S
Lukas S

Reputation: 3583

My guess what you're actually asking:

My dataframe looks like this:

      annual
1901  3373.2
1902  3520.7
1903  2957.4
1904  3079.6
1905  2566.7
1906  2534.4
1907     NaN
1908  3576.4
1909  2899.4
1910  2687.2

How do I fill in the nan with the average of the year before and after?

Answer:

df.interpolate()

gives

      annual
1901  3373.2
1902  3520.7
1903  2957.4
1904  3079.6
1905  2566.7
1906  2534.4
1907  3055.4
1908  3576.4
1909  2899.4
1910  2687.2

Have I guessed correctly what you're asking and are you happy with the solution? Notice btw. that this uses the default interpolation method that is method=linear.

Further question: What is wrong with my code?

There are few things to notice about your code that maybe you can learn from. The .mean in

     adjacentYearBefore = df.iloc[i-1,j].mean()
     adjacentYearAfter = df.iloc[i+1,j].mean()

don't do anything since you're taking the mean of one value.

Same is true for

     avgYear = (adjacentYearBefore + adjacentYearAfter).mean()

Notice that you're first adding the two values and then taking the mean of that one value so you didn't divide by two.

And finally in

     df.iloc[i,j] = df.iloc[i,j].fillna(avgYear)

you are taking one value and call fillna on it. That is not possible since fillna is supposed to be called on a series or dataframe filling in all the values. Here you could just assign the value.

Working version of your code:

for i in range(0, df.shape[0]):
      for j in range(1, df.shape[1]):  #iterate over columns
            if pd.isna(df.iloc[i,j]):
                adjacentYearBefore = df.iloc[i-1,j]
                adjacentYearAfter= df.iloc[i+1,j]
                avgYear = (adjacentYearBefore + adjacentYearAfter)/2
                print(avgYear)
                df.iloc[i,j] = avgYear

Upvotes: 4

Related Questions