Reputation: 277
I'm pretty new to Pandas but I'm trying to analyze a dataset of employee timestamps to determine the sum of unique daily timestamps per week.
My initial dataframe (input1) looks like this (but much longer):
ID Datetime Week/Year
0 15.0 2019-02-04 08:28:44 6/2019
1 15.0 2019-02-04 12:48:05 6/2019
2 15.0 2019-02-04 12:54:29 6/2019
3 15.0 2019-02-05 08:05:51 6/2019
4 15.0 2019-02-05 12:47:26 6/2019
5 15.0 2019-02-05 14:45:34 6/2019
6 15.0 2019-02-06 08:10:59 6/2019
7 15.0 2019-02-06 12:49:24 6/2019
8 15.0 2019-02-06 13:02:48 6/2019
9 15.0 2019-02-07 08:02:22 6/2019
10 15.0 2019-02-08 08:02:10 6/2019
11 15.0 2019-02-08 09:55:22 6/2019
I created another dataframe:
df = pd.DataFrame({'Timestamp': input1['Datetime'], 'ID': input1['ID'], 'Week/Year': input1['Week/Year'],'MDY':input1['Server Date/Time'].apply(lambda x: "%d/%d/%d" % (x.month, x.day, x.year))})
Then I grouped by Week, Employee, and got unique count per day (MDY):
df_grouped = df.groupby(['Week/Year', 'ID']).MDY.nunique()
Week/Year ID MDY
6/2019 15.0 5
The end result I'm looking for is to merge the MDY sums back to the initial dataframe by joining on Week and ID - I tried a few different ways:
input1.merge(df_grouped.to_frame(), left_on=['ID','Week/Year'], right_index=True)
to get something like:
ID Datetime Week/Year MDY
0 15.0 2019-02-04 08:28:44 6/2019 5
1 15.0 2019-02-04 12:48:05 6/2019 5
2 15.0 2019-02-04 12:54:29 6/2019 5
3 15.0 2019-02-05 08:05:51 6/2019 5
4 15.0 2019-02-05 12:47:26 6/2019 5
5 15.0 2019-02-05 14:45:34 6/2019 5
After the join I just end up getting NaN across the board. Anyone able to steer me in the right direction?
Thanks.
Upvotes: 0
Views: 1419
Reputation: 25269
this groupby
df_grouped = df.groupby(['Week/Year', 'WD: Employee ID']).MDY.nunique()
should return a series has index as Week/Year WD: Employee ID
Week/Year WD: Employee ID
6/2019 15.0 5
Name: MDY , dtype: int64
However, you show its index as Week/Year ID
. You may check columns name to make sure it match.
Next, on this
input1.merge(df_grouped.to_frame(), left_on=['ID','Week/Year'], right_index=True)
Assume df_grouped
has index as you show in example which is Week/Year ID
, you have left_on
wrong order against right_index
. It should be
input1.merge(df_grouped.to_frame(), left_on=['Week/Year', 'ID'], right_index=True)
Upvotes: 1