sshaikh
sshaikh

Reputation: 23

Converting a string number such as 31.1 M to a numeric / float?

I have a dataframe with several columns that are string datatypes with a number such as '31.1 M' or '1.2 K' which represent 31,100,000 and 1,200 respectively.

I'm attempting to convert the these columns into the a numeric representation. My challenge is having the 'M' and the 'K' in these rows...I was thinking of applying a for loop to a) remove the character, then depending on the character (whether 'M' or 'K') multiply the number by that factor. Then convert to a float, etc.

Any thoughts on how to approach this?? This one has be stumped!

Thanks so much, SS

Upvotes: 1

Views: 773

Answers (3)

Mithun Kinarullathil
Mithun Kinarullathil

Reputation: 97

I would write a function with arguments that you can later expand on.

suffix_values = {'K': 1000, 'M':1000000}
def decode_string(string, suffix_values=suffix_values):
 # copy string to not modify the source
 _string = string[:]
 # if any keys in suffix_values exist in the string, replace it.
 for suffix in suffix_values:
     if suffix in string:
         value = suffix_values[suffix]
         _numeric = float(_string.replace(suffix, '').strip())*value
         return _numeric     

Example:

decode_string(suffix_values, '2.3M')
2300000.0

Upvotes: 1

James
James

Reputation: 36736

You can drop commas, replace M with e6 and replace K with e3 and then convert to float.

import pandas as pd

df = pd.DataFrame({'A': [1,2,3], 'B': ['1,200', '31.1 M', '1.2 K']})

B_new = (
    df.B.str.replace(',','')
        .str.replace('\s*M', 'e6')
        .str.replace('\s*K', 'e3')
        .astype(float)
)

B_new
# returns:
0        1200.0
1    31100000.0
2        1200.0
Name: B, dtype: float64

Upvotes: 4

mapf
mapf

Reputation: 2068

Individually, this might be the easist solution, however without any code to work with, it's hard to come up with a version that scales well, should speed be of major concern:

string = '31.1 M'
value = float(string.replace('.', '').replace(' M', '000000'))
print(value)

Upvotes: 1

Related Questions