Reputation: 4864
I have a dataframe as in the image. (the numbers are really objects). Doing df.treasury_rate = pd.to_numeric(df.treasury_rate)
, predictably bombs. However, doing df.replace('.', np.nan)
does not appear to get rid of the dot, so I am flummoxed. Any suggestions?
UPDATE pd.to_numeric
takes an errors
keyword, which, when set to coerce
does the right thing, however, I am still confused as to why the .
is not getting replaced.
UPDATE 2 As text:
treasury_rate
1962-02-09 4.05
1962-02-10 4.05
1962-02-11 4.05
1962-02-12 .
1962-02-13 4.03
1962-02-14 4.03
1962-02-15 4.02
1962-02-16 4.02
1962-02-17 4.02
1962-02-18 4.02
Upvotes: 2
Views: 7564
Reputation: 13582
Given OP's dataframe
import pandas as pd
import datetime as dt
df = pd.DataFrame({'treasury_rate': [4.05, 4.05, 4.05, ".", 4.03, 4.03, 4.02, 4.02, 4.02, 4.0]},
index=[dt.datetime(1962, 2, 9), dt.datetime(1962, 2, 10), dt.datetime(1962, 2, 11), dt.datetime(1962, 2, 12), dt.datetime(1962, 2, 13), dt.datetime(1962, 2, 14), dt.datetime(1962, 2, 15), dt.datetime(1962, 2, 16), dt.datetime(1962, 2, 17), dt.datetime(1962, 2, 18)])
[Out]:
treasury_rate
1962-02-09 4.05
1962-02-10 4.05
1962-02-11 4.05
1962-02-12 .
1962-02-13 4.03
1962-02-14 4.03
1962-02-15 4.02
1962-02-16 4.02
1962-02-17 4.02
1962-02-18 4.0
There are various ways one can go about replacing the dot with NaN.
Using regular expression (with Python's re
)
import re
df_new = df.replace(re.compile('\.'), np.nan)
Using a custom lambda function and np.nan
import numpy as np
df_new = df.treasury_rate.apply(lambda x: np.nan if x == '.' else x)
Using pandas.DataFrame.replace
and np.nan
import numpy as np
df_new = df.replace('.', np.nan)
# or
df_new = df.replace(to_replace=r'\.', value=np.nan, regex=True)
In all three cases the output is
treasury_rate
1962-02-09 4.05
1962-02-10 4.05
1962-02-11 4.05
1962-02-12 NaN
1962-02-13 4.03
1962-02-14 4.03
1962-02-15 4.02
1962-02-16 4.02
1962-02-17 4.02
1962-02-18 4.00
Notes:
If one wants to do the operations on the same dataframe, one can use inplace=True
such as
df.replace(re.compile('\.'), np.nan)
Pandas has their own null value. So, instead of np.nan
, one might want to use pd.NA
. For more information on the difference between those two: pd.NA vs np.nan for pandas
If one uses pd.NA
, the output would be
treasury_rate
1962-02-09 4.05
1962-02-10 4.05
1962-02-11 4.05
1962-02-12 <NA>
1962-02-13 4.03
1962-02-14 4.03
1962-02-15 4.02
1962-02-16 4.02
1962-02-17 4.02
1962-02-18 4.0
There are strong opinions on using .apply
. If one wants to read some: When should I (not) want to use pandas apply() in my code?
Upvotes: 0
Reputation: 527
df.replace('\.','0', regex=True,inplace=True)
I think you have to give regex for '.' character to replace it.
Upvotes: 4
Reputation: 2755
I think you just need to add inplace=True if you want to update the df object inplace:
df.replace('.', np.nan, inplace=True)
Otherwise df.replace returns an new dataframe which you must assign to a variable to access:
new_df = df.replace('.', np.nan)
Upvotes: 2
Reputation: 44658
In this case creating a mask / subset and replacing the string literal would be better than attempting to use replace. You'll probably also want to convert your series to numeric you can also do it all in one step if you know your data is clean other than the "." character:
subs = df.treasury_rate == "."
df.loc[subs, 'treasure_rate'] = np.nan
# OR
df.treasury_rate = pd.to_numeric('treasury_rate', errors='coerce')
You could likely run the last line and have the dots replaced with NaN
Upvotes: 3