Inconsistent slicing [:] behavior on Pandas Dataframes

Question

I have 2 data frames. First dataframe has numbers as index. Second dataframe has datetime as index. The slice operator (:) behaves differently on these dataframes.

Case 1

>>> df = pd.DataFrame({'A':[1,2,3]}, index=[0,1,2])
>>> df
   A
0  1
1  2
2  3
>>> df [0:2]
   A
0  1
1  2

Case 2

>>> a = dt.datetime(2000,1,1)
>>> b = dt.datetime(2000,1,2)
>>> c = dt.datetime(2000,1,3)
>>> df = pd.DataFrame({'A':[1,2,3]}, index = [a,b,c])
>>> df
            A
2000-01-01  1
2000-01-02  2
2000-01-03  3
>>> df[a:b]
            A
2000-01-01  1
2000-01-02  2

Why does the final row gets excluded in case 1 but not in case 2?

jezrael · Accepted Answer

Dont use it, better is use loc for consistency:

df = pd.DataFrame({'A':[1,2,3]}, index=[0,1,2])

print (df.loc[0:2])
   A
0  1
1  2
2  3

a = datetime.datetime(2000,1,1)
b = datetime.datetime(2000,1,2)
c = datetime.datetime(2000,1,3)
df = pd.DataFrame({'A':[1,2,3]}, index = [a,b,c])

print (df.loc[a:b])
            A
2000-01-01  1
2000-01-02  2

Reason, why last row is omitted is possible find in docs:

With DataFrame, slicing inside of [] slices the rows. This is provided largely as a convenience since it is such a common operation.

print (df[0:2])
   A
0  1
1  2

For selecting by datetimes exact indexing is used :

... In contrast, indexing with Timestamp or datetime objects is exact, because the objects have exact meaning. These also follow the semantics of including both endpoints.

Inconsistent slicing [:] behavior on Pandas Dataframes

Answers (2)

Related Questions