Reputation: 459

Unique Values per row in Pandas groupby

I have a simple dataframe (df) like shown below:

index Job  Person 
1     j1   Cathy
2     j2   Mark
3     j3   Cathy
4     j4   Steve
5     j5   Cathy
6          Cathy
7     j1   Cathy

I would like to convert the above dataframe as:

Person CountJob  JobDetails
Cathy     4      j1;j3;j5
Mark      1      j2
Steve     1      j4

i.e, drop the Nulls and join Unique jobs for each person.

I can partially solve this using groupby :

df.groupby('Person').agg(CountJob=('Job','count'),
                         JobDetails=('Job',lambda x: ';'.join(x.dropna()))
                        )

I am able to drop null values but can't seem to get unique values; in the above example, for 'Cathy', 'JobDetails' becomes j1;j3;j5;j1

Upvotes: 2

Answers (3)

sammywemmy

Reputation: 28729

You could consider dropping the null before aggregating:

   (
    df.dropna()
    .groupby("Person")
    .agg(CountJob=("Job", "count"), JobDetails=("Job", set))
    .assign(JobDetails=lambda df: df.JobDetails.str.join(","))
)

      CountJob  JobDetails
Person      
Cathy   4   j1,j3,j5
Mark    1   j2
Steve   1   j4

Upvotes: 2

Umar.H

Reputation: 23099

Another method would be to set the index as the Person and stack to drop any nulls, then we can apply any group by args without any issue.

df.set_index('Person').stack().groupby('Person')\
                .agg(JobDetails=('unique'), Job=('count'))

          JobDetails  Job
Person                   
Cathy   [j1, j3, j5]    4
Mark            [j2]    1
Steve           [j4]    1

Upvotes: 1

BENY

Reputation: 323396

Adding unique chain with dropna

df.groupby('Person').agg(CountJob=('Job','count'),
                         JobDetails=('Job',lambda x: ';'.join(x.dropna().unique()))
                        )
Out[143]: 
        CountJob JobDetails
Person                     
Cathy          4   j1;j3;j5
Mark           1         j2
Steve          1         j4

Upvotes: 3

Unique Values per row in Pandas groupby

Answers (3)

Related Questions