Reputation: 4864

replacing dots in pandas dataframe

I have a dataframe as in the image. (the numbers are really objects). Doing df.treasury_rate = pd.to_numeric(df.treasury_rate), predictably bombs. However, doing df.replace('.', np.nan) does not appear to get rid of the dot, so I am flummoxed. Any suggestions?

UPDATE pd.to_numeric takes an errors keyword, which, when set to coerce does the right thing, however, I am still confused as to why the . is not getting replaced.

UPDATE 2 As text:

treasury_rate
1962-02-09 4.05
1962-02-10 4.05
1962-02-11 4.05
1962-02-12 .
1962-02-13 4.03
1962-02-14 4.03
1962-02-15 4.02 
1962-02-16 4.02
1962-02-17 4.02
1962-02-18 4.02

Upvotes: 2

Answers (4)

Gonçalo Peres

Reputation: 13582

Given OP's dataframe

import pandas as pd
import datetime as dt

df = pd.DataFrame({'treasury_rate': [4.05, 4.05, 4.05, ".", 4.03, 4.03, 4.02, 4.02, 4.02, 4.0]},
                    index=[dt.datetime(1962, 2, 9), dt.datetime(1962, 2, 10), dt.datetime(1962, 2, 11), dt.datetime(1962, 2, 12), dt.datetime(1962, 2, 13), dt.datetime(1962, 2, 14), dt.datetime(1962, 2, 15), dt.datetime(1962, 2, 16), dt.datetime(1962, 2, 17), dt.datetime(1962, 2, 18)])


[Out]:
           treasury_rate
1962-02-09          4.05
1962-02-10          4.05
1962-02-11          4.05
1962-02-12             .
1962-02-13          4.03
1962-02-14          4.03
1962-02-15          4.02
1962-02-16          4.02
1962-02-17          4.02
1962-02-18           4.0

There are various ways one can go about replacing the dot with NaN.

Using regular expression (with Python's re)

import re

df_new = df.replace(re.compile('\.'), np.nan)

Using a custom lambda function and np.nan

import numpy as np

df_new = df.treasury_rate.apply(lambda x: np.nan if x == '.' else x)

Using pandas.DataFrame.replace and np.nan

import numpy as np

df_new = df.replace('.', np.nan)

# or

df_new = df.replace(to_replace=r'\.', value=np.nan, regex=True)

In all three cases the output is

            treasury_rate
1962-02-09           4.05
1962-02-10           4.05
1962-02-11           4.05
1962-02-12            NaN
1962-02-13           4.03
1962-02-14           4.03
1962-02-15           4.02
1962-02-16           4.02
1962-02-17           4.02
1962-02-18           4.00

Notes:

If one wants to do the operations on the same dataframe, one can use inplace=True such as
```
df.replace(re.compile('\.'), np.nan)
```

Pandas has their own null value. So, instead of np.nan, one might want to use pd.NA. For more information on the difference between those two: pd.NA vs np.nan for pandas If one uses pd.NA, the output would be

           treasury_rate
1962-02-09          4.05
1962-02-10          4.05
1962-02-11          4.05
1962-02-12          <NA>
1962-02-13          4.03
1962-02-14          4.03
1962-02-15          4.02
1962-02-16          4.02
1962-02-17          4.02
1962-02-18           4.0

There are strong opinions on using .apply. If one wants to read some: When should I (not) want to use pandas apply() in my code?

Upvotes: 0

Nusrath

Reputation: 527

df.replace('\.','0', regex=True,inplace=True)

I think you have to give regex for '.' character to replace it.

Upvotes: 4

cullzie

Reputation: 2755

I think you just need to add inplace=True if you want to update the df object inplace:

df.replace('.', np.nan, inplace=True)

Otherwise df.replace returns an new dataframe which you must assign to a variable to access:

new_df = df.replace('.', np.nan)

Upvotes: 2

Brandon Bertelsen

Reputation: 44658

In this case creating a mask / subset and replacing the string literal would be better than attempting to use replace. You'll probably also want to convert your series to numeric you can also do it all in one step if you know your data is clean other than the "." character:

subs = df.treasury_rate == "."
df.loc[subs, 'treasure_rate'] = np.nan
# OR
df.treasury_rate = pd.to_numeric('treasury_rate', errors='coerce')

You could likely run the last line and have the dots replaced with NaN

Upvotes: 3

replacing dots in pandas dataframe

Answers (4)

Related Questions