Reputation: 5570
I need to merge two pandas data frames using a columns which contains numerical values.
For example, the two data frames could be like the following ones:
data frame "a"
a1 b1
0 "x" 13560
1 "y" 193309
2 "z" 38090
3 "k" 37212
data frame "b"
a2 b2
0 "x" 13,56
1 "y" 193309
2 "z" 38,09
3 "k" 37212
What i need to do, is merge a with b on column b1/b2.
The problem is that as you can see, some values of data frame b', are a little bit different. First of all, b' values are not integers but strings and furthermore, the values which end with 0 are "rounded" (13560 --> 13,56).
What i've tried to do, is replace the comma and then cast them to int, but it doesn't work; more in details this procedure doesn't add the missing zero.
This is the code that i've tried:
b['b2'] = b['b2'].str.replace(",", "")
b['b2'] = b['b2'].astype(np.int64) # np is numpy
Is there any procedure that i can use to fix this problem?
Upvotes: 1
Views: 60
Reputation: 18647
Correcting the column first with a apply
and a lambda function:
b.b2 = b.b2.apply(lambda x: int(x.replace(',','')) * 10 if ',' in x else int(x))
Upvotes: 0
Reputation: 862441
I believe need create boolean mask for specify which values has to be multiple:
#or add parameter thousands=',' to read_csv like suggest @Inder
b['b2'] = b['b2'].str.replace(",", "", regex=True).astype(np.int64)
mask = b['b2'] < 10000
b['b2'] = np.where(mask, b['b2'] * 10, b['b2'])
print (b)
a2 b2
0 x 13560
1 y 193309
2 z 38090
3 k 37212
Upvotes: 2