Yasmin
Yasmin

Reputation: 951

How can you interpolate NaN values if first few rows have NaN values?

Description: I am trying to interpolate missine values (represented as NaN), however the method is only work on the NaN values between known values. I am slo quite confused on how the value of the missing values are computed in bfill. As I understand it is only fill missing values by the same value as the first succeeding known value. Here's an example:

>>> df = pd.DataFrame([['M', '2014-01-01 00:26:00', '2'], ['M', 'M', 'M'], ['M', '2014-01-01 00:26:30', 9],[5, '2014-01-01 00:26:50', 'M'],[6, '2014-01-01 00:26:50', 'M']], columns=['x','y','z'])
>>> df
   x                    y  z
0  M  2014-01-01 00:26:00  2
1  M                    M  M
2  M  2014-01-01 00:26:30  9
3  5  2014-01-01 00:26:50  M
4  6  2014-01-01 00:26:50  M
>>> df = df.replace(['M'],[np.NaN])
>>> df
    x                    y    z
0 NaN  2014-01-01 00:26:00    2
1 NaN                  NaN  NaN
2 NaN  2014-01-01 00:26:30    9
3   5  2014-01-01 00:26:50  NaN
4   6  2014-01-01 00:26:50  NaN
>>> df['x'] = df['x'].astype(np.float64)
>>> df['z'] = df['z'].astype(np.float64)
>>> df['y'] = pd.to_datetime(df['y'])
>>> df
    x                   y   z
0 NaN 2014-01-01 00:26:00   2
1 NaN                 NaT NaN
2 NaN 2014-01-01 00:26:30   9
3   5 2014-01-01 00:26:50 NaN
4   6 2014-01-01 00:26:50 NaN
>>> df.interpolate()
    x                   y    z
0 NaN 2014-01-01 00:26:00  2.0
1 NaN                 NaT  5.5
2 NaN 2014-01-01 00:26:30  9.0
3   5 2014-01-01 00:26:50  9.0
4   6 2014-01-01 00:26:50  9.0
>>> df.interpolate(method='bfill')# try to fill first three rows in x
    x                   y   z
0   2 2014-01-01 00:26:00   2
1 NaN                 NaT NaN
2   9 2014-01-01 00:26:30   9
3   5 2014-01-01 00:26:50 NaN
4   6 2014-01-01 00:26:50 NaN

Goal: I want to fill x and z and if there is a possibility to fill y which has datetime type.

Upvotes: 0

Views: 2133

Answers (1)

Anton Protopopov
Anton Protopopov

Reputation: 31662

IIUC you could use interpolate to get your values for z column and then fillna with bfill:

In [122]: df.interpolate().fillna(method='bfill')
Out[122]:
   x                   y    z
0  5 2014-01-01 00:26:00  2.0
1  5 2014-01-01 00:26:30  5.5
2  5 2014-01-01 00:26:30  9.0
3  5 2014-01-01 00:26:50  9.0
4  6 2014-01-01 00:26:50  9.0

Or:

In [128]: df.fillna(method='bfill').interpolate()
Out[128]:
   x                   y  z
0  5 2014-01-01 00:26:00  2
1  5 2014-01-01 00:26:30  9
2  5 2014-01-01 00:26:30  9
3  5 2014-01-01 00:26:50  9
4  6 2014-01-01 00:26:50  9

Sequence of the methods depends on how are you want to fill last column

Upvotes: 3

Related Questions