Kevad
Kevad

Reputation: 2991

Pandas Datatype Conversion issue

I have a pandas series that looks like this: a bunch of unicode strings

>>> some_id
0    400742773466599424
1    400740479161352192
2    398829879107809281
3    398823962966097921
4    398799036070653952
Name: some_id, dtype: object

I can do the following but I lose the precision.

>>> some_id.convert_objects(convert_numeric=True)
0    4.007428e+17
1    4.007405e+17
2    3.988299e+17
3    3.988240e+17
4    3.987990e+17
Name: some_id, dtype: float64

But if I do some_id.astype(int), I get the following: ValueError: invalid literal for long() with base 10

How can I convert them to int or int64 type while preserving the precision ? I am using Pandas 0.16.2

UPDATE: I found the bug. some_id.astype(int) or any other form of it should work. Somewhere along the thousands of rows I have, some_id has a string of text (not a stringed number), so it was stopping the int64 conversion.

Thanks

Upvotes: 0

Views: 369

Answers (2)

Alexander
Alexander

Reputation: 109546

Original series of numbers:

s = pd.Series([400742773466599424, 400740479161352192, 398829879107809281,
               398823962966097921, 398799036070653952], dtype=object)

>>> s
0    400742773466599424
1    400740479161352192
2    398829879107809281
3    398823962966097921
4    398799036070653952
dtype: object

Simply converting using .astype(int) should be sufficient.

>>> s.astype(int)
0    400742773466599424
1    400740479161352192
2    398829879107809281
3    398823962966097921
4    398799036070653952
dtype: int64

As an interesting side note (as pointed out by @Warren Weckesser and @DSM), you can lose precision due to floating point representation. For example, int(1e23) gets represented as 99999999999999991611392L. I'm not sure if this was the precision to which you referred, or if you were merely talking about the displayed precision.

With your sample data above, two numbers would be off by one:

>>> s.astype(np.int64) - s.astype(float).astype(np.int64)
0    0
1    0
2    1
3    1
4    0
dtype: int64

Upvotes: 0

Alex
Alex

Reputation: 826

Dagrha is right, you should be able to use :

some_id.astype(np.int64)

the type will then be :

In[40]: some_id.dtypes
Out[41]: 
some_id    int64
dtype: object

Upvotes: 1

Related Questions