Reputation: 38649
Consider this simple example:
>>> import pandas as pd
>>>
dfA = pd.DataFrame({
"key":[1,3,6,10,15,21],
"columnA":[10,20,30,40,50,60],
"columnB":[100,200,300,400,500,600],
"columnC":[110,202,330,404,550,606],
})
>>> dfA
key columnA columnB columnC
0 1 10 100 110
1 3 20 200 202
2 6 30 300 330
3 10 40 400 404
4 15 50 500 550
5 21 60 600 606
If I want to use .loc here, it works fine:
>>> dfA.set_index('key').loc[2:16]
columnA columnB columnC
key
3 20 200 202
6 30 300 330
10 40 400 404
15 50 500 550
... but if I do a "cast" (.astype) to Int64, it fails:
>>> dfA.astype('Int64').set_index('key').loc[2:16]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:/msys64/mingw64/lib/python3.8/site-packages/pandas/core/indexing.py", line 1768, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "C:/msys64/mingw64/lib/python3.8/site-packages/pandas/core/indexing.py", line 1912, in _getitem_axis
return self._get_slice_axis(key, axis=axis)
File "C:/msys64/mingw64/lib/python3.8/site-packages/pandas/core/indexing.py", line 1796, in _get_slice_axis
indexer = labels.slice_indexer(
File "C:/msys64/mingw64/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 4712, in slice_indexer
start_slice, end_slice = self.slice_locs(start, end, step=step, kind=kind)
File "C:/msys64/mingw64/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 4925, in slice_locs
start_slice = self.get_slice_bound(start, "left", kind)
File "C:/msys64/mingw64/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 4837, in get_slice_bound
label = self._maybe_cast_slice_bound(label, side, kind)
File "C:/msys64/mingw64/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 4789, in _maybe_cast_slice_bound
self._invalid_indexer("slice", label)
File "C:/msys64/mingw64/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3075, in _invalid_indexer
raise TypeError(
TypeError: cannot do slice indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [2] of <class 'int'>
>>>
Why does this happen - and can I have this kind of .loc indexing with Int64 too? (I have to use Int64, because I read in .csv data which has missing values, and I don't want the values casted to floats - but I'd still like to use .loc as in the above case)
EDIT: a bit more info:
>>> dfA.astype('Int64').loc(0)[0]['key']
1
>>> type(dfA.astype('Int64').loc(0)[0]['key'])
<class 'numpy.int64'>
Ok, so the actual numbers in case of dtype 'Int64' are of class 'numpy.int64' - but that still cannot be used for .loc in this case:
>>> import numpy as np
>>> dfA.astype('Int64').set_index('key').loc[np.int64(2):np.int64(2)]
...
TypeError: cannot do slice indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [2] of <class 'numpy.int64'>
Upvotes: 1
Views: 1029
Reputation: 5741
You can circumvent this by making key
the index first and then converting to Int64
:
dfA.set_index('key').astype('Int64').loc[2:16]
columnA columnB columnC
key
3 20 200 202
6 30 300 330
10 40 400 404
15 50 500 550
Or converting only your key
column to old-fashioned int64
:
df.index = df['key'].astype('int64')
That is, presuming it does not have <NA>
values like your other columns apparently do.
Upvotes: 2