Reputation: 5965
Let's say I have the following DataFrame:
df = pd.DataFrame({'item': ['Subway', 'Pasta', 'Chipotle'],
'cost': [10, 5, 9],
'date': ['2017-12-01', '2017-11-01', '2017-10-01']})
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
I'm able to get all items in 2017-10
(only one item in this case):
print(df.set_index('date')['2017-10'])
According to the pandas documentation and this SO answer, I should be able to get all items from 2017-10
to 2017-11
(2 items in this case) with the following command but I'm getting an empty DataFrame:
print(df.set_index('date')['2017-10':'2017-11'])
Any idea what I'm doing wrong here (I'm using pandas version 0.21.0
)?
Moreover, is there an efficient way I can get all items in 2017-10
and 2017-12
(skipping 2017-11
)? I've come up with the following solution but I shouldn't have to create new columns like so:
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.year
print(df[((df.month==10) & (df.year==2017) | (df.month==12) & (df.year==2017))])
Upvotes: 0
Views: 2085
Reputation: 339
An alternative approach may be to use boolean indexing.
Here you provide statements that must be true in order for the rows to be returned.
For your second question, this would be:
df_October_and_December = df.ix[((df['date'] >= '2017-10-01') & (df['date'] <= '2017-10-31')) | ((df['date'] >= '2017-12-01') & (df['date'] <= '2017-12-31')) ,:]
The more elegant version of what you want is:
df_October_and_December = df.ix[(df['date'].dt.month.isin([10,12])) ,:]
I tend to use .ix referencing given it's flexibility and refine to .loc or .iloc if the application allows.
Upvotes: 0
Reputation: 21264
First use set_index()
with DatetimeIndex
. Then you can use the indexing approach you wanted.
df.set_index(pd.DatetimeIndex(df.date), inplace=True)
df.sort_index().loc['2017-10':'2017-11']
cost date item
date
2017-10-01 9 2017-10-01 Chipotle
2017-11-01 5 2017-11-01 Pasta
With respect to your second question, you can also access the month
property once you have a DatetimeIndex
.
df.loc[df.index.month.isin([10,12])]
cost date item
date
2017-12-01 10 2017-12-01 Subway
2017-10-01 9 2017-10-01 Chipotle
(For the second part, to index by year as well, add & df.index.year == 2017
)
Upvotes: 1
Reputation: 2692
I reversed the order I was searching for the items so:
import pandas as pd
df = pd.DataFrame({'item': ['Subway', 'Pasta', 'Chipotle'],
'cost': [10, 5, 9],
'date': ['2017-12-01', '2017-11-01', '2017-10-01']})
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
print(df.set_index('date')['2017-11':'2017-10'])
For your 'date' it went from high to low. By switching them I got this output:
cost item
date
2017-11-01 5 Pasta
2017-10-01 9 Chipotle
Upvotes: 1