Reputation: 129
Hi I have a pandas dataframe column which I need to set as numeric.
First I need to remove the 'M' (for millions) from the data. Then I can use to_numeric function. But the end result seems to just be a series of NaN's. Looking further into it, the numeric method isn't working because the column still contains an 'M" - hence the replace method isn't working.
Why is the replace method not removing the 'M'?
#!/usr/local/bin/python3
import requests
import pandas as pd
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0'}
url = 'https://www.sharesoutstandinghistory.com/ivv/'
r = requests.get(url, headers=headers)
df = pd.read_html(r.content, header =0)[1]
df.columns = ['Date', 'Value'] # set column names
print(df)
df['Value'].replace('M', '', inplace=True) # replace M
df['Value'] = pd.to_numeric(df['Value'], errors='coerce') # set to numeric
print(df)
Here is what I get:
Date Value
0 1/6/2010 194.70M
1 1/11/2010 194.45M
2 1/19/2010 193.85M
3 1/21/2010 193.70M
4 1/25/2010 192.90M
... ... ...
1049 3/9/2020 652.75M
1050 3/16/2020 654.45M
1051 3/23/2020 627.00M
1052 4/6/2020 631.45M
1053 4/13/2020 633.05M
[1054 rows x 2 columns]
Date Value
0 1/6/2010 NaN
1 1/11/2010 NaN
2 1/19/2010 NaN
3 1/21/2010 NaN
4 1/25/2010 NaN
... ... ...
1049 3/9/2020 NaN
1050 3/16/2020 NaN
1051 3/23/2020 NaN
1052 4/6/2020 NaN
1053 4/13/2020 NaN
Upvotes: 1
Views: 86
Reputation: 862851
It not remove M
, because no regex=True
parameter which is necessary for substring replacement:
df['Value'] = pd.to_numeric(df['Value'].replace('M', '', regex=True) , errors='coerce')
I think inplace
is not good practice, check this and this.
Upvotes: 1
Reputation: 11
Maybe you can try another way by using this
df.Value=df.Value.str[:-1]
to remove the M.
Upvotes: 1