harbun
harbun

Reputation: 525

Pandas replace values in dataframe timeseries

I have a pandas dataframe df with pandas.tseries.index.DatetimeIndex as index.

The data is like this:

Time                 Open  High Low   Close Volume
2007-04-01 21:02:00 1.968 2.389 1.968 2.389 18.300000
2007-04-01 21:03:00 157.140 157.140 157.140 157.140 2.400000

....

I want to replace one datapoint, lets day 2.389 in column Close with NaN:

In: df["Close"].replace(2.389, np.nan)
Out: 2007-04-01 21:02:00      2.389
     2007-04-01 21:03:00    157.140

Replace did not change 2.389 to NaN. Whats wrong?

Upvotes: 5

Views: 1740

Answers (2)

unutbu
unutbu

Reputation: 881037

replace might not work with floats because the floating point representation you see in the repr of the DataFrame might not be the same as the underlying float. For example, the actual Close value might be:

In [141]: df = pd.DataFrame({'Close': [2.389000000001]})

yet the repr of df looks like:

In [142]: df
Out[142]: 
   Close
0  2.389

So instead of checking for float equality, it is usually better to check for closeness:

In [150]: import numpy as np
In [151]: mask = np.isclose(df['Close'], 2.389)

In [152]: mask
Out[152]: array([ True], dtype=bool)

You can then use the boolean mask to select and change the desired values:

In [145]: df.loc[mask, 'Close'] = np.nan

In [146]: df
Out[146]: 
   Close
0    NaN

Upvotes: 6

EdChum
EdChum

Reputation: 394469

You need to assign the result to df['Close'] or pass param inplace=True : df['Close'].replace(2.389, np.NaN, inplace=True)

e.g.:

In [5]:

df['Close'] = df['Close'].replace(2.389, np.NaN)
df['Close']
Out[5]:
0      2.389
1    157.140
Name: Close, dtype: float64

Most pandas operations return a copy and some accept the param inplace.

Check the docs: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.replace.html#pandas.Series.replace

Upvotes: 3

Related Questions