Reputation: 600
I have 2 data frames. First dataframe has numbers as index. Second dataframe has datetime as index. The slice operator (:) behaves differently on these dataframes.
Case 1
>>> df = pd.DataFrame({'A':[1,2,3]}, index=[0,1,2])
>>> df
A
0 1
1 2
2 3
>>> df [0:2]
A
0 1
1 2
Case 2
>>> a = dt.datetime(2000,1,1)
>>> b = dt.datetime(2000,1,2)
>>> c = dt.datetime(2000,1,3)
>>> df = pd.DataFrame({'A':[1,2,3]}, index = [a,b,c])
>>> df
A
2000-01-01 1
2000-01-02 2
2000-01-03 3
>>> df[a:b]
A
2000-01-01 1
2000-01-02 2
Why does the final row gets excluded in case 1 but not in case 2?
Upvotes: 1
Views: 189
Reputation: 862611
Dont use it, better is use loc
for consistency:
df = pd.DataFrame({'A':[1,2,3]}, index=[0,1,2])
print (df.loc[0:2])
A
0 1
1 2
2 3
a = datetime.datetime(2000,1,1)
b = datetime.datetime(2000,1,2)
c = datetime.datetime(2000,1,3)
df = pd.DataFrame({'A':[1,2,3]}, index = [a,b,c])
print (df.loc[a:b])
A
2000-01-01 1
2000-01-02 2
Reason, why last row is omitted is possible find in docs:
With DataFrame, slicing inside of [] slices the rows. This is provided largely as a convenience since it is such a common operation.
print (df[0:2])
A
0 1
1 2
For selecting by datetimes exact indexing is used :
... In contrast, indexing with Timestamp or datetime objects is exact, because the objects have exact meaning. These also follow the semantics of including both endpoints.
Upvotes: 5
Reputation: 2838
Okay to understand this first let's run an experiment
import pandas as pd
import datetime as dt
a = dt.datetime(2000,1,1)
b = dt.datetime(2000,1,2)
c = dt.datetime(2000,1,3)
df = pd.DataFrame({'A':[4,5,6]}, index=[a,b,c])
Now let's use
df2[0:2]
Which gives us
A
2000-01-01 1
2000-01-02 2
Now this behavior is consistent through python and list slicing, but if you use
df[a:c]
You get
A
2000-01-01 1
2000-01-02 2
2000-01-03 3
this is because df[a:c]
overrides the default list slicing method as indexes do not correspond to integers, and in the function written in Pandas which also includes the last element, so if your indexes were integers, pandas defaults to inbuilt slicing, whereas if they are not integers, this effect is observed, as already mentioned in the answer by jezrael, it is better to use loc
, as that has more consistency across the board.
Upvotes: 1