Reputation: 23
I have a dataframe with several columns that are string datatypes with a number such as '31.1 M' or '1.2 K' which represent 31,100,000 and 1,200 respectively.
I'm attempting to convert the these columns into the a numeric representation. My challenge is having the 'M' and the 'K' in these rows...I was thinking of applying a for loop to a) remove the character, then depending on the character (whether 'M' or 'K') multiply the number by that factor. Then convert to a float, etc.
Any thoughts on how to approach this?? This one has be stumped!
Thanks so much, SS
Upvotes: 1
Views: 773
Reputation: 97
I would write a function with arguments that you can later expand on.
suffix_values = {'K': 1000, 'M':1000000}
def decode_string(string, suffix_values=suffix_values):
# copy string to not modify the source
_string = string[:]
# if any keys in suffix_values exist in the string, replace it.
for suffix in suffix_values:
if suffix in string:
value = suffix_values[suffix]
_numeric = float(_string.replace(suffix, '').strip())*value
return _numeric
Example:
decode_string(suffix_values, '2.3M')
2300000.0
Upvotes: 1
Reputation: 36736
You can drop commas, replace M
with e6
and replace K
with e3
and then convert to float.
import pandas as pd
df = pd.DataFrame({'A': [1,2,3], 'B': ['1,200', '31.1 M', '1.2 K']})
B_new = (
df.B.str.replace(',','')
.str.replace('\s*M', 'e6')
.str.replace('\s*K', 'e3')
.astype(float)
)
B_new
# returns:
0 1200.0
1 31100000.0
2 1200.0
Name: B, dtype: float64
Upvotes: 4
Reputation: 2068
Individually, this might be the easist solution, however without any code to work with, it's hard to come up with a version that scales well, should speed be of major concern:
string = '31.1 M'
value = float(string.replace('.', '').replace(' M', '000000'))
print(value)
Upvotes: 1