Giordano
Giordano

Reputation: 5570

Convert to integer numeric strings pandas dataframe

I need to merge two pandas data frames using a columns which contains numerical values.
For example, the two data frames could be like the following ones:

data frame "a"

    a1   b1
0  "x"  13560
1  "y"  193309
2  "z"  38090
3  "k"  37212

data frame "b"

    a2    b2
0  "x"  13,56
1  "y"  193309
2  "z"  38,09
3  "k"  37212

What i need to do, is merge a with b on column b1/b2.
The problem is that as you can see, some values of data frame b', are a little bit different. First of all, b' values are not integers but strings and furthermore, the values which end with 0 are "rounded" (13560 --> 13,56).

What i've tried to do, is replace the comma and then cast them to int, but it doesn't work; more in details this procedure doesn't add the missing zero.
This is the code that i've tried:

b['b2'] = b['b2'].str.replace(",", "")
b['b2'] = b['b2'].astype(np.int64) # np is numpy

Is there any procedure that i can use to fix this problem?

Upvotes: 1

Views: 60

Answers (2)

Chris Adams
Chris Adams

Reputation: 18647

Correcting the column first with a apply and a lambda function:

b.b2 = b.b2.apply(lambda x: int(x.replace(',','')) * 10 if ',' in x else int(x))

Upvotes: 0

jezrael
jezrael

Reputation: 862441

I believe need create boolean mask for specify which values has to be multiple:

#or add parameter  thousands=',' to read_csv like suggest @Inder
b['b2'] = b['b2'].str.replace(",", "", regex=True).astype(np.int64)

mask = b['b2'] < 10000
b['b2'] = np.where(mask, b['b2'] * 10, b['b2'])
print (b)
  a2      b2
0  x   13560
1  y  193309
2  z   38090
3  k   37212

Upvotes: 2

Related Questions