Reputation: 305
I have a Pandas DataFrame on which I would like to do some manipulations. First I sort my dataframe on the entropy using this code:
entropy_dataframe.sort_values(by='entropy',inplace=True,ascending=False)
This gives me the following dataframe (<class 'pandas.core.frame.DataFrame'>
):
entropy identifier
486 1.000000 3.955030e+09
584 1.000000 8.526030e+09
397 1.000000 5.623020e+09
819 0.999700 1.678030e+09
.. ... ...
179 0.000000 3.724020e+09
766 0.000000 6.163020e+09
770 0.000000 6.163020e+09
462 0.000000 7.005020e+09
135 0.000000 3.069001e+09
Now I would like to select the 10 largest identifiers and return a list with the corresponding 10 identifiers (as integers). I have tried selecting the top 10 identifiers by either using:
entropy_top10 = entropy_dataframe.head(10)['identifier']
And:
entropy_top10 = entropy_dataframe[:10]
entropy_top10 = entropy_top10['identifier']
Which both give the following result (<class 'pandas.core.series.Series'>
):
397 2.623020e+09
823 8.678030e+09
584 2.526030e+09
486 7.955030e+09
396 2.623020e+09
555 9.768020e+09
492 7.955030e+09
850 9.606020e+09
159 2.785020e+09
745 4.609030e+09
Name: identifier, dtype: float64
Even though both work, the pain starts after this operation as I now would like to change this Pandas Series with dtype float64 to a list of integers.
I have tried the following:
entropy_top10= np.array(entropy_top10,dtype=pd.Series)
entropy_top10= entropy_top10.astype(np.int64)
entropy_top10= entropy_top10.tolist()
Which results in (<type 'list'>
):
[7955032207L, 8613030044L, 2623057011L, 2526030291L, 7951030016L, 2623020357L, 9768028572L, 9606023013L, 2785021210L, 9768023351L]
Which is a list of longs (while I'm looking for integers).
Anyone that can help me out here? Thanks in advance!
--- EDIT ---
The problem lies 'here'. When I remove entropy_top10= entropy_top10.tolist()
, it results in a <type 'numpy.ndarray'>
with elements of dtype numpy.int64
. When I add the code again, I get a <type 'list'>
with elements long
.
Upvotes: 0
Views: 1428
Reputation: 5210
Since users may not skim through all of the comments on your original question, I'll condense our results into a single answer.
According to sys.maxint
, a 32bit version of python is running. Since some list elements are larger than maxint
(2**31 - 1
), the elements are stored as long
values
The transformation entropy_top10.astype(np.int64)
creates a numpy.ndarray
of 64bit integers in numpy
's own data type. numpy
ships a 64bit integer data type even for 32bit python (which is no python native type whatsoever!).
The transformation entropy_top10.tolist()
converts the numpy
data type back to python's native data type. Since you are running 32bit, the int64
can only be convertet to long
type
For a 64bit python version, the tolist()
transformation would most likely result in python native integer types, because the values would fit into the regular integer at 64bit (2**63 - 1
)
The reason for your list containing long
items is the translation between numpy
datatypes and native datatypes of your installed python
version. Independent from the actual python version that is used to run code, numpy
is consistent in its own datatypes.
To make the difference between the list's type and the items' types clearer, see this code example:
a = np.array([3123123123, 1512451234], dtype=np.int64)
print('ALL NUMPY')
print(' List items', a)
print(' List type', type(a))
print(' Item type', type(a[0]))
l = a.tolist()
print('ALL PYTHON NATIVE')
print(' List items', l)
print(' List type', type(l))
print(' Item type', type(l[0]))
c = [i for i in a]
print('NATIVE LIST, NUMPY TYPE')
print(' List items', c)
print(' List type', type(c))
print(' Item type', type(c[0]))
It gives the following output:
ALL NUMPY
List items [3123123123 1512451234]
List type <type 'numpy.ndarray'>
Item type <type 'numpy.int64'>
ALL PYTHON NATIVE
List items [3123123123L, 1512451234L]
List type <type 'list'>
Item type <type 'long'>
NATIVE LIST, NUMPY TYPE
List items [3123123123, 1512451234]
List type <type 'list'>
Item type <type 'numpy.int64'>
From this output, we can learn, that numpy
's tolist()
function does not only convert the list from numpy.ndarray
to list
but also transforms all items' types from numpy.int64
to long
. Manually transforming the array into a native list (using a comprehension here) yields a python native list with elements of type numpy.int64
.
Upvotes: 2