Tomas
Tomas

Reputation: 305

Python - changing Pandas DataFrame with float64's to list with integers

I have a Pandas DataFrame on which I would like to do some manipulations. First I sort my dataframe on the entropy using this code:

entropy_dataframe.sort_values(by='entropy',inplace=True,ascending=False)

This gives me the following dataframe (<class 'pandas.core.frame.DataFrame'>):

      entropy    identifier
486  1.000000  3.955030e+09
584  1.000000  8.526030e+09
397  1.000000  5.623020e+09
819  0.999700  1.678030e+09
..        ...           ...
179  0.000000  3.724020e+09
766  0.000000  6.163020e+09
770  0.000000  6.163020e+09
462  0.000000  7.005020e+09
135  0.000000  3.069001e+09

Now I would like to select the 10 largest identifiers and return a list with the corresponding 10 identifiers (as integers). I have tried selecting the top 10 identifiers by either using:

entropy_top10 = entropy_dataframe.head(10)['identifier']

And:

entropy_top10 = entropy_dataframe[:10]
entropy_top10 = entropy_top10['identifier']

Which both give the following result (<class 'pandas.core.series.Series'>):

397    2.623020e+09
823    8.678030e+09
584    2.526030e+09
486    7.955030e+09
396    2.623020e+09
555    9.768020e+09
492    7.955030e+09
850    9.606020e+09
159    2.785020e+09
745    4.609030e+09
Name: identifier, dtype: float64

Even though both work, the pain starts after this operation as I now would like to change this Pandas Series with dtype float64 to a list of integers.

I have tried the following:

entropy_top10= np.array(entropy_top10,dtype=pd.Series)
entropy_top10= entropy_top10.astype(np.int64)
entropy_top10= entropy_top10.tolist()

Which results in (<type 'list'>):

[7955032207L, 8613030044L, 2623057011L, 2526030291L, 7951030016L, 2623020357L, 9768028572L, 9606023013L, 2785021210L, 9768023351L]

Which is a list of longs (while I'm looking for integers).

Anyone that can help me out here? Thanks in advance!

--- EDIT ---

The problem lies 'here'. When I remove entropy_top10= entropy_top10.tolist(), it results in a <type 'numpy.ndarray'> with elements of dtype numpy.int64. When I add the code again, I get a <type 'list'> with elements long.

Upvotes: 0

Views: 1428

Answers (1)

jbndlr
jbndlr

Reputation: 5210

Since users may not skim through all of the comments on your original question, I'll condense our results into a single answer.

  • According to sys.maxint, a 32bit version of python is running. Since some list elements are larger than maxint (2**31 - 1), the elements are stored as long values

  • The transformation entropy_top10.astype(np.int64) creates a numpy.ndarray of 64bit integers in numpy's own data type. numpy ships a 64bit integer data type even for 32bit python (which is no python native type whatsoever!).

  • The transformation entropy_top10.tolist() converts the numpy data type back to python's native data type. Since you are running 32bit, the int64 can only be convertet to long type

  • For a 64bit python version, the tolist() transformation would most likely result in python native integer types, because the values would fit into the regular integer at 64bit (2**63 - 1)

The reason for your list containing long items is the translation between numpy datatypes and native datatypes of your installed python version. Independent from the actual python version that is used to run code, numpy is consistent in its own datatypes.

Edit

To make the difference between the list's type and the items' types clearer, see this code example:

a = np.array([3123123123, 1512451234], dtype=np.int64)
print('ALL NUMPY')
print('  List items', a)
print('  List type', type(a))
print('  Item type', type(a[0]))

l = a.tolist()
print('ALL PYTHON NATIVE')
print('  List items', l)
print('  List type', type(l))
print('  Item type', type(l[0]))

c = [i for i in a]
print('NATIVE LIST, NUMPY TYPE')
print('  List items', c)
print('  List type', type(c))
print('  Item type', type(c[0]))

It gives the following output:

ALL NUMPY
  List items [3123123123 1512451234]
  List type <type 'numpy.ndarray'>
  Item type <type 'numpy.int64'>
ALL PYTHON NATIVE
  List items [3123123123L, 1512451234L]
  List type <type 'list'>
  Item type <type 'long'>
NATIVE LIST, NUMPY TYPE
  List items [3123123123, 1512451234]
  List type <type 'list'>
  Item type <type 'numpy.int64'>

From this output, we can learn, that numpy's tolist() function does not only convert the list from numpy.ndarray to list but also transforms all items' types from numpy.int64 to long. Manually transforming the array into a native list (using a comprehension here) yields a python native list with elements of type numpy.int64.

Upvotes: 2

Related Questions