Reputation: 803
How do I subset a pandas dataframe to obtain rows with data from particular months?
I have a date column in the 2010-01-01 format.
If it was indexed, I would use
df.ix[date1:date2]
But what do I do if the data is in a column?
Upvotes: 2
Views: 4226
Reputation: 32244
Selecting ranges in DataFrames can be done by using masks.
Masks are just normal pd.Series
containing True
and False
elements.
General example using minions:
df_minions = pd.DataFrame({
'color':['Red', 'Green', 'Blue', 'Brown'] * 2,
'name':['Burnie', 'Stinky', 'Swimmy', 'Bashy', 'Flamie', 'Stabbie', 'Blubb', 'Smashie']})
color name
0 Red Burnie
1 Green Stinky
2 Blue Swimmy
3 Brown Bashy
4 Red Flamie
5 Green Stabbie
6 Blue Blubb
7 Brown Smashie
Selecting all brown minions can easily be done like this:
brown_minion_mask = df_minions['color'] == 'Brown'
0 False
1 False
2 False
3 True
4 False
5 False
6 False
7 True
df_minions[brown_minion_mask]
color name
3 Brown Bashy
7 Brown Smashie
Now for your specific question about selecting on the month of the date:
First I will add a spawned
column which is full of dates
df_minions['spawned'] = [datetime(2015, m, 5) for m in range(4,6)] * 4
color name spawned
0 Red Burnie 2015-04-05
1 Green Stinky 2015-05-05
2 Blue Swimmy 2015-04-05
3 Brown Bashy 2015-05-05
4 Red Flamie 2015-04-05
5 Green Stabbie 2015-05-05
6 Blue Blubb 2015-04-05
7 Brown Smashie 2015-05-05
Now we can access a very special properly of pd.TimeSeries
which is the .dt
accessor
df_minions.spawned.dt.month
0 4
1 5
2 4
3 5
4 4
5 5
6 4
7 5
We can use this operation to mask our dataframe, just in the same way as we did with the color of our minions.
may_minion_mask = df_minions.spawned.dt.month == 5
df_minions[may_minion_mask]
color name spawned
1 Green Stinky 2015-05-05
3 Brown Bashy 2015-05-05
5 Green Stabbie 2015-05-05
7 Brown Smashie 2015-05-05
You can of course do any kind of operation you want in a mask
not_spawned_in_january = df_minions.spawned.dt.month != 1
summer_minions = ((df_minions.spawned > datetime(2015,5,15)) &
(df_minions.spawned < datetime(2015,9,15))
name_endswith_y = df_minions.name.str.endswith('y')
Upvotes: 6