Yazan Alatoom
Yazan Alatoom

Reputation: 185

How to deal with missing data by python

I have a set of data stored in txt file as follows:

Elevation(ROAD)       Interval    
1.3                    1
3.3                    2
4.1                    3
-1.5                   4
NA                     5
NA                     6
6.8                    7
2.1                    8
5.1                    9
NA                     10
6.1                    11
NA                     12
NA                     13
NA                     14

is there any method to interpolate these missing data (NA) using python? by example using averaging technique

Upvotes: 1

Views: 92

Answers (3)

farbiondriven
farbiondriven

Reputation: 2468

If, for any case, you can't use external libraries:

file_content = """1.3
3.3
4.1
-1.5
NA
NA
6.8
2.1
5.1
NA
6.1
NA
NA
NA
7.1
NA"""

def isfloat(value):
  try:
    float(value)
    return True
  except ValueError:
    return False

class ParsedList:
  def __init__(self):
    self.list = []
    self.holes = {} # index key, value length

  def set_value(self, number):
    if isfloat(number):
      self.list.append(float(number))
    else:
      key = len(self.list)-1
      if key in self.holes:
        self.holes[key] += 1
      else:
        self.holes[key] = 1

  def interpolate(self):
    output = list(self.list)
    offset=0

    for index, size in self.holes.items():
      if index < len(self.list)-1:
        delta = (self.list[index+1] - self.list[index])/(size+1)
        init_value = self.list[index]
      else:
        delta =0
        init_value = self.list[-1]
      for i in range(size):
        output.insert(index+i+1+offset, init_value+delta*(i+1))
      offset+=size
    return output

# test:
parsed_list = ParsedList() 
for x in file_content.splitlines():
  parsed_list.set_value(x)

[print(x) for x in parsed_list.interpolate()]

Upvotes: 1

Ransaka Ravihara
Ransaka Ravihara

Reputation: 1994

Assuming your pandas data frame as df

df['Elevation'].fillna((df['Elevation'].mean()), inplace=True)

Try this out!

Upvotes: 1

tags
tags

Reputation: 4059

You don't provide much detail. You don't either show code.

One simple way to get what you want is to create a pandas.Series() to which you apply the interpolate function (google for it if you need specific interpolation settings; they may be slightly different depending on the pandas version you are using).

(My understanding is that your Interval column is a simple dataframe index).

import pandas as pd
import numpy as np
data = [1.3, 3.3, 4.1 -1.5, np.nan , np.nan , 6.8, 2.1, 5.1, np.nan, 6.1, np.nan , np.nan , np.nan]
ser = pd.Series(data)
ser.interpolate()

Upvotes: 2

Related Questions