LaveyWhiffle
LaveyWhiffle

Reputation: 25

Pandas - Determine How Many Days Recorded in Dataframe

I have data formatted like this (where Timestamp is in datetime format)

Timestamp           Location
7/24/2017 14:55:19  Location_A
7/24/2017 15:09:16  Location_B
7/25/2017 14:05:45  Location_B
7/26/2017 8:34:40   Location_A
7/26/2017 8:45:56   Location_B
7/26/2017 9:34:11   Location_A
7/26/2017 10:03:14  Location_B
7/27/2017 9:26:14   Location_B
7/27/2017 9:37:26   Location_B
7/31/2017 10:22:46  Location_A
7/31/2017 10:59:02  Location_B

Each time a timestamp occurs with either Location_A or Location_B, then that means that location was open that day. I'd like to count the number of days each location is open so that the output looks like this:

Location    Days_Open
Location_A  3
Location_B  5

I know I need to use groupby and likely a lambda function, but I'm not having much luck. I appreciate any help

Upvotes: 1

Views: 416

Answers (3)

fuglede
fuglede

Reputation: 18201

You could do something like

(pd.crosstab(df.index.date, df.Location) > 0).sum()

With your data:

In [64]: df
Out[64]:
                       Location
Timestamp
2017-07-24 14:55:19  Location_A
2017-07-24 15:09:16  Location_B
2017-07-25 14:05:45  Location_B
2017-07-26 08:34:40  Location_A
2017-07-26 08:45:56  Location_B
2017-07-26 09:34:11  Location_A
2017-07-26 10:03:14  Location_B
2017-07-27 09:26:14  Location_B
2017-07-27 09:37:26  Location_B
2017-07-31 10:22:46  Location_A
2017-07-31 10:59:02  Location_B

In [65]: (pd.crosstab(df.index.date, df.Location) > 0).sum()
Out[65]:
Location
Location_A    3
Location_B    5
dtype: int64

If Timestamp isn't your index, either replace pd.index.date with df.Timestamp.date, or set it as the index by using df.set_index('Timestamp').

A different approach could be

df.groupby(['Location', pd.Grouper(freq='D')]).size().unstack(0).count()

and the same thing with pivot_table would be

df.pivot_table(index=df.index.date, columns='Location', aggfunc='size').count()

In both cases, the size is a bit silly; any constant function (such as lambda _: 1) would do as well.

Upvotes: 0

Marat
Marat

Reputation: 15738

Assuming Timestamp is a string,

df['Timestamp'].str.split(' ', n=1).str[0].groupby(df['Location']).nunique()

The first part, df['Timestamp'].str.split(' ', n=1).str[0], extracts the date part of the string. Then, it is grouped by location so that we count unique dates separately by location.

If Timestamp is a timestamp column, the left part will simplify to df['Timestamp'].dt.date. The full expression then will be:

df['Timestamp'].dt.date.groupby(df['Location']).nunique()

Upvotes: 2

Kenan
Kenan

Reputation: 14094

This is another approach, groupby date and count

df.groupby(by=df['Timestamp'].dt.date).agg(count=('Location', 'value_counts')).reset_index()['Location'].value_counts()

Location_B    5
Location_A    3
Name: Location, dtype: int64

Upvotes: 0

Related Questions