Reputation: 401
My sample dataframe consists of:
dictx = {'col':[20,'nan',22,'nan','nan','nan',30,'nan',28,'nan',25]}
df = pd.DataFrame(dictx).astype(float)
df = df.reset_index()
PART 1
I need to fill those missing data with either the mean of the extremities, e.g.
df1 = df.iloc[:3,[1]]
col
0 20.0
1 NaN
2 22.0
The value for the index 1
should be 21.
This problem will reappear on other situations which also need the same treatment
PART 2
Or when the NaN's are more than one, I need to plot the data from a line chart as follow:
df2 = df.iloc[2:7,[1]]
col
2 22.0
3 NaN
4 NaN
5 NaN
6 30.0
x = df.iat[6,1]
x0 = df.iat[2,1]
y = df.iat[6,0]
y0 = df.iat[2,0]
slope = (x - x0)/(y-y0)
value = slope*(x - x0) + y0
So the value would vary for each index
My objectives are:
The real dataframe is constantly changing and has 1440 rows so this problem repeats over and over.
I need more help in part 1, because I can apply a similar approach for part 2 using the logic from the first.
Upvotes: 1
Views: 106
Reputation: 153460
I think you are trying to do linear interpolation, use interpolate
:
Let's try:
df.interpolate()
Output:
index col
0 0 20.0
1 1 21.0
2 2 22.0
3 3 24.0
4 4 26.0
5 5 28.0
6 6 30.0
7 7 29.0
8 8 28.0
9 9 26.5
10 10 25.0
Upvotes: 1