Reputation: 4470

Index pandas data frame by index

sample dataframe:

data = [["2011-01-01",23],["2011-01-02",33],["2011-01-03",43],["2011-01-04",53]]
df= pd.DataFrame(data,columns = ["A","B"])
df["A"] = pd.to_datetime(df["A"])
df.index = df["A"]
del df["A"]

OP:

            B
A   
2011-01-01  23
2011-01-02  33
2011-01-03  43
2011-01-04  53

I am trying to split this data frame in two parts using the following code:

part1 = df.loc[:"2011-01-02"]

op:

            B
A   
2011-01-01  23
2011-01-02  33

part2:

part2 = df.loc["2011-01-02":]

op:

            B
A   
2011-01-02  33
2011-01-03  43
2011-01-04  53

But the row with index "2011-01-02" is in both the parts(part1 and part2). Any suggestions for pandas 1-liners to get that row in just 1 part and not both.

Upvotes: 2

Answers (4)

BENY

Reputation: 323226

Using get_loc with iloc

df.iloc[:df.index.get_loc('2011-01-02')]
                    A   B
A                        
2011-01-01 2011-01-01  23

df.iloc[df.index.get_loc('2011-01-02'):]
                    A   B
A                        
2011-01-02 2011-01-02  33
2011-01-03 2011-01-03  43
2011-01-04 2011-01-04  53

Upvotes: 1

Derek Eden

Reputation: 4618

slice = df.index > "2011-01-02"
df[slice]
df[~slice]

Upvotes: 2

Dyno Fu

Reputation: 9044

The behavior is expected (I don't know it until today either)

This type of slicing will work on a DataFrame with a DatetimeIndex as well. Since the partial string selection is a form of label slicing, the endpoints will be included. This would include matching times on an included date: from http://pandas-docs.github.io/pandas-docs-travis/user_guide/timeseries.html#indexing.

And about the label slicing behavior

Note that contrary to usual python slices, both the start and the stop are included https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html#pandas.DataFrame.loc

In [16]: df[df.index < '2011-01-02']
Out[16]:
             B
A
2011-01-01  23

In [17]: df[df.index >= '2011-01-02']
Out[17]:
             B
A
2011-01-02  33
2011-01-03  43
2011-01-04  53

In [18]: df[df.index > '2011-01-02']
Out[18]:
             B
A
2011-01-03  43
2011-01-04  53

Upvotes: 2

Elias Kassell

Reputation: 180

Instead of part2 = df.loc["2011-01-02":] use

part2 = df.loc["2011-01-02":].iloc[1:]

             B
A             
2011-01-03  43
2011-01-04  53

Upvotes: 1

Index pandas data frame by index

Answers (4)

Related Questions