Reputation: 4470
sample dataframe:
data = [["2011-01-01",23],["2011-01-02",33],["2011-01-03",43],["2011-01-04",53]]
df= pd.DataFrame(data,columns = ["A","B"])
df["A"] = pd.to_datetime(df["A"])
df.index = df["A"]
del df["A"]
OP:
B
A
2011-01-01 23
2011-01-02 33
2011-01-03 43
2011-01-04 53
I am trying to split this data frame in two parts using the following code:
part1 = df.loc[:"2011-01-02"]
op:
B
A
2011-01-01 23
2011-01-02 33
part2:
part2 = df.loc["2011-01-02":]
op:
B
A
2011-01-02 33
2011-01-03 43
2011-01-04 53
But the row with index "2011-01-02" is in both the parts(part1 and part2). Any suggestions for pandas 1-liners to get that row in just 1 part and not both.
Upvotes: 2
Views: 59
Reputation: 323226
Using get_loc
with iloc
df.iloc[:df.index.get_loc('2011-01-02')]
A B
A
2011-01-01 2011-01-01 23
df.iloc[df.index.get_loc('2011-01-02'):]
A B
A
2011-01-02 2011-01-02 33
2011-01-03 2011-01-03 43
2011-01-04 2011-01-04 53
Upvotes: 1
Reputation: 9044
The behavior is expected (I don't know it until today either)
This type of slicing will work on a DataFrame with a DatetimeIndex as well. Since the partial string selection is a form of label slicing, the endpoints will be included. This would include matching times on an included date: from http://pandas-docs.github.io/pandas-docs-travis/user_guide/timeseries.html#indexing.
And about the label slicing behavior
Note that contrary to usual python slices, both the start and the stop are included https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.loc.html#pandas.DataFrame.loc
In [16]: df[df.index < '2011-01-02']
Out[16]:
B
A
2011-01-01 23
In [17]: df[df.index >= '2011-01-02']
Out[17]:
B
A
2011-01-02 33
2011-01-03 43
2011-01-04 53
In [18]: df[df.index > '2011-01-02']
Out[18]:
B
A
2011-01-03 43
2011-01-04 53
Upvotes: 2
Reputation: 180
Instead of part2 = df.loc["2011-01-02":]
use
part2 = df.loc["2011-01-02":].iloc[1:]
B
A
2011-01-03 43
2011-01-04 53
Upvotes: 1