Reputation: 25
I have data formatted like this (where Timestamp is in datetime format)
Timestamp Location
7/24/2017 14:55:19 Location_A
7/24/2017 15:09:16 Location_B
7/25/2017 14:05:45 Location_B
7/26/2017 8:34:40 Location_A
7/26/2017 8:45:56 Location_B
7/26/2017 9:34:11 Location_A
7/26/2017 10:03:14 Location_B
7/27/2017 9:26:14 Location_B
7/27/2017 9:37:26 Location_B
7/31/2017 10:22:46 Location_A
7/31/2017 10:59:02 Location_B
Each time a timestamp occurs with either Location_A or Location_B, then that means that location was open that day. I'd like to count the number of days each location is open so that the output looks like this:
Location Days_Open
Location_A 3
Location_B 5
I know I need to use groupby and likely a lambda function, but I'm not having much luck. I appreciate any help
Upvotes: 1
Views: 416
Reputation: 18201
You could do something like
(pd.crosstab(df.index.date, df.Location) > 0).sum()
With your data:
In [64]: df
Out[64]:
Location
Timestamp
2017-07-24 14:55:19 Location_A
2017-07-24 15:09:16 Location_B
2017-07-25 14:05:45 Location_B
2017-07-26 08:34:40 Location_A
2017-07-26 08:45:56 Location_B
2017-07-26 09:34:11 Location_A
2017-07-26 10:03:14 Location_B
2017-07-27 09:26:14 Location_B
2017-07-27 09:37:26 Location_B
2017-07-31 10:22:46 Location_A
2017-07-31 10:59:02 Location_B
In [65]: (pd.crosstab(df.index.date, df.Location) > 0).sum()
Out[65]:
Location
Location_A 3
Location_B 5
dtype: int64
If Timestamp
isn't your index, either replace pd.index.date
with df.Timestamp.date
, or set it as the index by using df.set_index('Timestamp')
.
A different approach could be
df.groupby(['Location', pd.Grouper(freq='D')]).size().unstack(0).count()
and the same thing with pivot_table
would be
df.pivot_table(index=df.index.date, columns='Location', aggfunc='size').count()
In both cases, the size
is a bit silly; any constant function (such as lambda _: 1
) would do as well.
Upvotes: 0
Reputation: 15738
Assuming Timestamp
is a string,
df['Timestamp'].str.split(' ', n=1).str[0].groupby(df['Location']).nunique()
The first part, df['Timestamp'].str.split(' ', n=1).str[0]
, extracts the date part of the string. Then, it is grouped by location so that we count unique dates separately by location.
If Timestamp
is a timestamp column, the left part will simplify to df['Timestamp'].dt.date
. The full expression then will be:
df['Timestamp'].dt.date.groupby(df['Location']).nunique()
Upvotes: 2
Reputation: 14094
This is another approach, groupby date and count
df.groupby(by=df['Timestamp'].dt.date).agg(count=('Location', 'value_counts')).reset_index()['Location'].value_counts()
Location_B 5
Location_A 3
Name: Location, dtype: int64
Upvotes: 0