Replace method not removing string from pandas dataframe column

Question

Hi I have a pandas dataframe column which I need to set as numeric.

First I need to remove the 'M' (for millions) from the data. Then I can use to_numeric function. But the end result seems to just be a series of NaN's. Looking further into it, the numeric method isn't working because the column still contains an 'M" - hence the replace method isn't working.

Why is the replace method not removing the 'M'?

#!/usr/local/bin/python3

import requests
import pandas as pd

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0'}

url = 'https://www.sharesoutstandinghistory.com/ivv/'
r = requests.get(url, headers=headers)
df = pd.read_html(r.content, header =0)[1]
df.columns = ['Date', 'Value']  # set column names

print(df)

df['Value'].replace('M', '', inplace=True)  # replace M

df['Value'] = pd.to_numeric(df['Value'], errors='coerce')  # set to numeric

print(df)

Here is what I get:

           Date    Value
0      1/6/2010  194.70M
1     1/11/2010  194.45M
2     1/19/2010  193.85M
3     1/21/2010  193.70M
4     1/25/2010  192.90M
...         ...      ...
1049   3/9/2020  652.75M
1050  3/16/2020  654.45M
1051  3/23/2020  627.00M
1052   4/6/2020  631.45M
1053  4/13/2020  633.05M

[1054 rows x 2 columns]
           Date  Value
0      1/6/2010    NaN
1     1/11/2010    NaN
2     1/19/2010    NaN
3     1/21/2010    NaN
4     1/25/2010    NaN
...         ...    ...
1049   3/9/2020    NaN
1050  3/16/2020    NaN
1051  3/23/2020    NaN
1052   4/6/2020    NaN
1053  4/13/2020    NaN

jezrael · Accepted Answer

It not remove M, because no regex=True parameter which is necessary for substring replacement:

df['Value'] = pd.to_numeric(df['Value'].replace('M', '', regex=True) , errors='coerce')

I think inplace is not good practice, check this and this.

Replace method not removing string from pandas dataframe column

Answers (2)

Related Questions