Why is Pandas not returing a scalar/string instead of a Series when accessing a DataFrame with both a column and an index value?

Question

Assuming the following 3 Pandas Dataframes:

df1:

   ask.c    ask.h    ask.l    ask.o    bid.c    bid.h    bid.l    bid.o complete    mid.c    mid.h    mid.l    mid.o  volume     date 

0  1.39146  1.39148  1.39146  1.39146  1.39138  1.39140  1.39136  1.39136     True  1.39142  1.39144  1.39141  1.39141      5    2014-03-19 09:00:00
.  .  
.  . 
.  .

df2:

   ask.c    ask.h    ask.l    ask.o    bid.c    bid.h    bid.l    bid.o complete    mid.c    mid.h    mid.l    mid.o  volume   date

0  1.39147  1.39148  1.39147  1.39147  1.39138  1.39138  1.39137  1.39137     True  1.39142  1.39142  1.39142  1.39142    4    2014-03-19 09:00:05
1  1.39149  1.39149  1.39148  1.39149  1.39138  1.39141  1.39138  1.39141     True  1.39143  1.39145  1.39143  1.39145    3    2014-03-19 09:00:10
.  .  
.  . 
.  .

df is built with:

df = pandas.concat([df1,df2], ignore_index=True)
df.drop_duplicates("date", keep="first", inplace=True)
df["date"] = pd.to_datetime(df["date"])
df = df.set_index("date")

Why is df["ask.c"]["2014-03-19 09:00:00"] returning a Panda Series instead of a scalar/string value ?

print(type(df1["ask.c"]["2014-03-19 09:00:00"]))¬                                                                              


print(df1["ask.c"]["2014-03-19 09:00:00"])¬
Date
2014-03-19 09:00:00    1.39146
Name: ask.c, dtype: object

EDIT:

@IIya V.Schurov's explanation got me on the right track but we are not quite fully there yet. Pandas has something called "DatetimeIndex Partial String Indexing" which is a form slicing.

Hence the expression df["2016-12-07"] would return a series rather than a scalar. The documentation mentions that df["2016-12-07 09:00:00"] isnot a slice nor resolve to one and will raise a KeyError but in my case, it does not raise any KeyError and does resolve like a slice.

Ilya V. Schurov · Accepted Answer

Adding a new answer to keep previous one for historical purposes.

Minimal example

import pandas as pd
series = pd.Series([1, 2, 3], pd.DatetimeIndex(['2016-12-07 09:00:00',
                                                '2016-12-07 09:00:05',
                                                '2016-12-08 09:00:10']))
print(type(series["2016-12-07 09:00:00"]))
#

The documentation

As the docs discuss, one can use string on DateTimeIndex's __getitem__/[] operator to get so-called Partial String Indexing. It works like this:

series['2016-12-07']
# give me all records for December, 07 2016

2016-12-07 09:00:00    1
2016-12-07 09:00:05    2
dtype: int64

One can explicitly provide a timestamp to get scalar instead of Series:

print(type(series[pd.to_datetime("2016-12-07 09:00:00")]))
#

Resolutions

String used as subscript can be detected as coerceable to slice or not. In the former case, an output of [] is Series. Otherwise, it is scalar (or KeyError). Moreover, if string is detected as coercable to slice, it can be used to index DataFrame as a slice (i.e. row-wise, not column-wise):

df = pd.DataFrame(series)
print(df["2016-12-07"])

                     0
2016-12-07 09:00:00  1
2016-12-07 09:00:05  2

When string can be considered as a slice? It depends on the resolution of this string and DateTimeIndex instance's resolution. Examples:

series = pd.Series([1, 2, 3], pd.DatetimeIndex(['2016-12-06 23:59:00'
                                                '2016-12-07 01:00:00',
                                                '2016-12-07 01:01:00',
                                                '2016-12-07 01:02:00']))
print(series.index.resolution)
# minute
print(type(series["2016-12-06"]))
# 
print(type(series["2016-12-07"]))
# 
print(type(series["2016-12-07 01"]))
# 
print(type(series["2016-12-07 01:01"]))
#

So generally string is a slice if its resolution is less than the resolution of DateTimeIndex instance. However, there is an exception for second resolution.

series = pd.Series([1, 2, 3], pd.DatetimeIndex(['2016-12-06 23:59:00'
                                                '2016-12-07 01:00:00',
                                                '2016-12-07 01:01:00',
                                                '2016-12-07 01:02:01']))
print(series.index.resolution)
# second
# this is due to 1 second at the last timestamp
print(type(series["2016-12-07 01:01:00"]))
#

I can't answer, why this decision was made (see the discussion here for the references in the code), but it definitely the OP's case (as df2['date']'s resolution is second). So every timestamp with second resolution will be resolved to slice.

UPD. Finally the behavior of slicing for second resolution is fixed (in currrent master, pending to 0.20 release).

Why is Pandas not returing a scalar/string instead of a Series when accessing a DataFrame with both a column and an index value?

Answers (2)

Minimal example

The documentation

Resolutions

Related Questions