aabujamra
aabujamra

Reputation: 4636

Turning string currency numbers inside a dataframe into float using Python or Pandas

I have this dataframe:

df:
                    CNPJ       last_revenues
0     01.637.895/0001-32       R$ 12.696.658
1     02.916.265/0001-60   R$ 162.914.526,04
2     02.932.074/0001-91        R$ 1.928.312
3     03.853.896/0001-40    R$ 19.333.453,92

The values in the column "last_revenues" are strings of currency values in latin format. I need to turn them into floats, it would have to look like this:

df:
                    CNPJ       last_revenues
0     01.637.895/0001-32       12696658.0000
1     02.916.265/0001-60      162914526.0400
2     02.932.074/0001-91        1928312.0000
3     03.853.896/0001-40       19333453.9200

Edit

Actually I noticed that there are a few nan's in the middle of it. So my df looks like this:

df:
                    CNPJ       last_revenues
0     01.637.895/0001-32       R$ 12.696.658
1     02.916.265/0001-60   R$ 162.914.526,04
2     02.932.074/0001-91                 nan
3     03.853.896/0001-40    R$ 19.333.453,92

And I want to make it look like this:

df:
                    CNPJ       last_revenues
0     01.637.895/0001-32       12696658.0000
1     02.916.265/0001-60      162914526.0400
2     02.932.074/0001-91                 nan
3     03.853.896/0001-40       19333453.9200

Upvotes: 1

Views: 432

Answers (3)

Rojo Pitiavana
Rojo Pitiavana

Reputation: 41

You can also try:

df.last_revenues.replace('R\$ ','',inplace=True,regex=True)
df.last_revenues.replace('\.','',inplace=True,regex=True)
df.last_revenues.replace(',','.',inplace=True,regex=True)

Upvotes: 1

Dhruv Aggarwal
Dhruv Aggarwal

Reputation: 177

There are 2 things required:

  1. Detect nan
  2. Trim the leading currency (separated by space) (rstr.split(' ')[1])
  3. Convert the currency string to int

    df['last_revenues'] = df['last_revenues'].apply(lambda rstr: '{:5.4f}'.format(float(rstr.split(' ')[1].replace('.','').replace(',', '.'))) if rstr != 'nan' else 'nan')
    

Upvotes: 1

JohanL
JohanL

Reputation: 6891

One way to do it, is to apply a conversion function to each element in last_revenues. This can be done in many ways, but as a lambda one liner it can be written as:

df['last_revenues'] = df['last_revenues'].dropna().apply(lambda rstr: float(rstr.replace('.','').replace(',','.')[3:]))

Here I first I drop all nan values, thereafter I replace all dots with nothing, then the comma with a dot. Finally I remove the initial currency characters and convert the resulting string to a float.

Upvotes: 2

Related Questions