mlabenski
mlabenski

Reputation: 33

Comparing pandas rows that have identical values in columns, and then group by the unique id

While there are questions on stackoverflow that somewhat specify the same queries i'm looking to make on a pandas dataframe, none have been able to identify MULTIPLE rows that share the same values.

To explain my issue, I have a dataframe with info on the people/time slot for when they decide to use the gym. It looks like this,

,User,Time,Date
0, User 1 ,12:00PM ,10/5/20             (Identical 3 times)
1, User 2 ,12:00PM ,10/5/20             (Identical 3 times)
2, User 3 ,12:00PM ,10/5/20
3, User 1 ,1:00PM ,10/4/20              (Identical 2 times)
4, User 2 ,1:00PM ,10/4/20              (Identical 2 times)
5, User 5 ,1:00PM ,10/4/20
6, User 6 ,1:00PM ,10/4/20
7, User 7 ,12:00PM ,10/4/20
9, User 1 ,11:00AM ,10/4/20            (Identical 1 time)
10, User 2 ,11:00AM ,10/4/20           (Identical 1 time)
11, User 3 ,10:00AM ,10/4/20
12, User 6 ,10:00AM ,10/4/20
13, User 7 ,10:00AM ,10/4/20

My goal is to create a dataframe that would group the "Names" column by the "Time" and "date" for every row in the frame, which would create a count of Users that share the same signup time/date as another User. By doing that on the dataset above, it should look somewhat like-

 ,User, User, Count of identical gym times
0, User 1 , User 2, 3 
1, User 3 , User 1, 1 
2, User 3 , User 2, 1
3, User 1 , User 5, 1
4, User 2 , User 5, 1
5, User 2 , User 6, 1
6, User 3 , User 6, 1
7, User 3 , User 7, 1
8, User 4 , User 6, 1
9, User 4 , User 7, 1

I followed a few guides that tried to count the amount of times that rows are similar,

df.groupby('Date').User.nunique()

would return,

Date
2020-08-20     6
2020-08-21    13
2020-08-22    15
2020-08-23    18
2020-08-24    25
2020-08-25    24
2020-08-26    24
2020-08-27    24
2020-08-28    20
2020-08-29    12
2020-08-30     8

This doesnt help. Are there any other guides for this "query"?

Upvotes: 0

Views: 195

Answers (1)

Mehdi Golzadeh
Mehdi Golzadeh

Reputation: 2583

Does this output helps?

df.groupby(['Time','Date'],as_index=False).agg({'User':lambda x:  [item for item in x]})

      Time     Date                          User
0  10:00AM  10/4/20         [User3, User6, User7]
1  11:00AM  10/4/20                [User1, User2]
2  12:00PM  10/4/20                       [User7]
3  12:00PM  10/5/20         [User1, User2, User3]
4   1:00PM  10/4/20  [User1, User2, User5, User6]

Upvotes: 0

Related Questions