Reputation: 107
I have a dataframe of surface weather observations (fzraHrObs
) organized by a station identifier code and date. fzraHrObs
has several columns of weather data. The station code and date (datetime objects) look like:
usaf dat
716270 2014-11-23 12:00:00
2015-12-20 08:00:00
2015-12-20 09:00:00
2015-12-21 04:00:00
2015-12-28 03:00:00
716280 2015-12-19 08:00:00
2015-12-19 08:00:00
I would like to get a count of the number of unique dates (days) per year for each station - i.e. the number of days of obs per year at each station. In my example above this would give me:
usaf Year Count
716270 2014 1
2015 3
716280 2014 0
2015 1
I've tried using groupby and grouping by station, year, and date:
grouped = fzraHrObs['dat'].groupby(fzraHrObs['usaf'], fzraHrObs.dat.dt.year, fzraHrObs.dat.dt.date])
Count, size, nunique, etc. on this just gives me the number of obs on each date, not the number of dates themselves per year. Any suggestions on getting what I want here?
Upvotes: 7
Views: 6530
Reputation: 16241
The following should work:
df.groupby(['usaf', df.dat.dt.year])['dat'].apply(lambda s: s.dt.date.nunique())
What I did differently is group by two levels only, then use the nunique
method of pandas series to count the number of unique dates in each group.
Upvotes: 2
Reputation: 214927
Could be something like this, group the date by usaf
and year
and then count the number of unique values:
import pandas as pd
df.dat.apply(lambda dt: dt.date()).groupby([df.usaf, df.dat.apply(lambda dt: dt.year)]).nunique()
# usaf dat
# 716270 2014 1
# 2015 3
# 716280 2015 1
# Name: dat, dtype: int64
Upvotes: 2