Reputation: 4429
not sure what the problem is here... all i want is the first and only element in this series
>>> a
1 0-5fffd6b57084003b1b582ff1e56855a6!1-AB8769635...
Name: id, dtype: object
>>> len (a)
1
>>> type(a)
<class 'pandas.core.series.Series'>
>>> a[0]
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
a[0]
File "C:\Python27\lib\site-packages\pandas\core\series.py", line 601, in __getitem__
result = self.index.get_value(self, key)
File "C:\Python27\lib\site-packages\pandas\core\indexes\base.py", line 2477, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas\_libs\index.pyx", line 98, in pandas._libs.index.IndexEngine.get_value (pandas\_libs\index.c:4404)
File "pandas\_libs\index.pyx", line 106, in pandas._libs.index.IndexEngine.get_value (pandas\_libs\index.c:4087)
File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)
File "pandas\_libs\hashtable_class_helper.pxi", line 759, in pandas._libs.hashtable.Int64HashTable.get_item (pandas\_libs\hashtable.c:14031)
File "pandas\_libs\hashtable_class_helper.pxi", line 765, in pandas._libs.hashtable.Int64HashTable.get_item (pandas\_libs\hashtable.c:13975)
KeyError: 0L
why isn't that working? and how do get the first element?
Upvotes: 3
Views: 10551
Reputation: 3710
Look at the following Code:
import pandas as pd
import numpy as np
data1 = pd.Series(['a','b','c'],index=['1','3','5'])
data2 = pd.Series(['a','b','c'],index=[1,3,5])
print('keys data1: '+str(data1.keys()))
print('keys data2: '+str(data2.keys()))
print('base data1: '+str(data1.index.base))
print('base data2: '+str(data2.index.base))
print(data1['1':'3']) # Here we use the dictionary like slicing
print(data1[1:3]) # Here we use the integer like slicing
print(data2[1:3]) # Here we use the integer like slicing
keys data1: Index(['1', '3', '5'], dtype='object')
keys data2: Int64Index([1, 3, 5], dtype='int64')
base data1: ['1' '3' '5']
base data2: [1 3 5]
1 a
3 b
dtype: object
3 b
5 c
dtype: object
3 b
5 c
dtype: object
For data1, the dtype of the index is object, for data2 it is int64. Taking a look into Jake VanderPlas's Data Science Handbook he writes: "a Series object acts in many ways like a one-dimensional NumPy array, and in many ways like a standard Python dictionary". Hence if the index is of type "object" as in the case of data1, we have two different ways to acces the values:
1. By dictionary like slicing/indexing:
data1['1','3'] --> a,b
By integer like slicing/indexing:
data1[1:3] --> b,c
If the index dtype is of type int64 as in the case of data2, pandas has no opportunity to decide if we want to have index or dictionry like slicing/indexing and hence it defaults to index like slicing/indexing and consequently for data2[1:3] we get b,c just as for data1 when we choose integer like slicing/indexing.
Nevertheless VanderPlas mentions to keep in mind one critical thing in that case:
"Notice that when you are slicing with an explicit index (i.e., data['a':'c']), the final index is included in
the slice, while when you’re slicing with an implicit index (i.e., data[0:2]), the final index is excluded from the slice.[...] These slicing and indexing conventions can be a source of confusion."
To overcome this confuction you can use the loc for label based slicing/indexing and iloc for index based slicing/indexing
like:
import pandas as pd
import numpy as np
data1 = pd.Series(['a','b','c'],index=['1','3','5'])
data2 = pd.Series(['a','b','c'],index=[1,3,5])
print('data1.iloc[0:2]: ',str(data1.iloc[0:2]),sep='\n',end='\n\n')
# print(data1.loc[1:3]) --> Throws an error bacause there is no integer index of 1 or 3 (these are strings)
print('data1.loc["1":"3"]: ',str(data1.loc['1':'3']),sep='\n',end='\n\n')
print('data2.iloc[0:2]: ',str(data2.iloc[0:2]),sep='\n',end='\n\n')
print('data2.loc[1:3]: ',str(data2.loc[1:3]),sep='\n',end='\n\n') #Note that contrary to usual python slices, both the start and the stop are included
data1.iloc[0:2]:
1 a
3 b
dtype: object
data1.loc["1":"3"]:
1 a
3 b
dtype: object
data2.iloc[0:2]:
1 a
3 b
dtype: object
data2.loc[1:3]:
1 a
3 b
dtype: object
So data2.loc[1:3] searches explicitly for the values of 1 and 3 in the index and returns the values which lay between them while data2.iloc[0:2] returns the values between the zerost element in the index and the second element in the index excluding the second element.
Upvotes: 2
Reputation:
When the index is integer, you cannot use positional indexers because the selection would be ambiguous (should it return based on label or position?). You need to either explicitly use a.iloc[0]
or pass the label a[1]
.
The following works because the index type is object:
a = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
a
Out:
a 1
b 2
c 3
dtype: int64
a[0]
Out: 1
But for integer index, things are different:
a = pd.Series([1, 2, 3], index=[2, 3, 4])
a[2] # returns the first entry - label based
Out: 1
a[1] # raises a KeyError
KeyError: 1
Upvotes: 10