user2491254
user2491254

Reputation: 803

Subset pandas dataframe by particular months of dates in column?

How do I subset a pandas dataframe to obtain rows with data from particular months?

I have a date column in the 2010-01-01 format.

If it was indexed, I would use

df.ix[date1:date2]

But what do I do if the data is in a column?

Upvotes: 2

Views: 4226

Answers (1)

firelynx
firelynx

Reputation: 32244

Selecting ranges in DataFrames can be done by using masks.

Masks are just normal pd.Series containing True and False elements.

General example using minions:

df_minions = pd.DataFrame({
    'color':['Red', 'Green', 'Blue', 'Brown'] * 2,
    'name':['Burnie', 'Stinky', 'Swimmy', 'Bashy', 'Flamie', 'Stabbie', 'Blubb', 'Smashie']})
   color     name
0    Red   Burnie
1  Green   Stinky
2   Blue   Swimmy
3  Brown    Bashy
4    Red   Flamie
5  Green  Stabbie
6   Blue    Blubb
7  Brown  Smashie

Selecting all brown minions can easily be done like this:

brown_minion_mask = df_minions['color'] == 'Brown'
0    False
1    False
2    False
3     True
4    False
5    False
6    False
7     True

df_minions[brown_minion_mask]
   color     name
3  Brown    Bashy
7  Brown  Smashie

Now for your specific question about selecting on the month of the date:

First I will add a spawned column which is full of dates

df_minions['spawned'] = [datetime(2015, m, 5) for m in range(4,6)] * 4
   color     name    spawned
0    Red   Burnie 2015-04-05
1  Green   Stinky 2015-05-05
2   Blue   Swimmy 2015-04-05
3  Brown    Bashy 2015-05-05
4    Red   Flamie 2015-04-05
5  Green  Stabbie 2015-05-05
6   Blue    Blubb 2015-04-05
7  Brown  Smashie 2015-05-05

Now we can access a very special properly of pd.TimeSeries which is the .dt accessor

df_minions.spawned.dt.month
0    4
1    5
2    4
3    5
4    4
5    5
6    4
7    5

We can use this operation to mask our dataframe, just in the same way as we did with the color of our minions.

may_minion_mask = df_minions.spawned.dt.month == 5
df_minions[may_minion_mask]
   color     name    spawned
1  Green   Stinky 2015-05-05
3  Brown    Bashy 2015-05-05
5  Green  Stabbie 2015-05-05
7  Brown  Smashie 2015-05-05

You can of course do any kind of operation you want in a mask

not_spawned_in_january = df_minions.spawned.dt.month != 1
summer_minions = ((df_minions.spawned > datetime(2015,5,15)) &
                  (df_minions.spawned < datetime(2015,9,15))
name_endswith_y = df_minions.name.str.endswith('y')

Upvotes: 6

Related Questions