Reputation: 459
I have a simple dataframe
(df) like shown below:
index Job Person
1 j1 Cathy
2 j2 Mark
3 j3 Cathy
4 j4 Steve
5 j5 Cathy
6 Cathy
7 j1 Cathy
I would like to convert the above dataframe
as:
Person CountJob JobDetails
Cathy 4 j1;j3;j5
Mark 1 j2
Steve 1 j4
i.e, drop the Nulls and join Unique jobs for each person.
I can partially solve this using groupby
:
df.groupby('Person').agg(CountJob=('Job','count'),
JobDetails=('Job',lambda x: ';'.join(x.dropna()))
)
I am able to drop null values but can't seem to get unique values; in the above example, for 'Cathy', 'JobDetails' becomes j1;j3;j5;j1
Upvotes: 2
Views: 1452
Reputation: 28729
You could consider dropping the null before aggregating:
(
df.dropna()
.groupby("Person")
.agg(CountJob=("Job", "count"), JobDetails=("Job", set))
.assign(JobDetails=lambda df: df.JobDetails.str.join(","))
)
CountJob JobDetails
Person
Cathy 4 j1,j3,j5
Mark 1 j2
Steve 1 j4
Upvotes: 2
Reputation: 23099
Another method would be to set the index as the Person
and stack
to drop any nulls, then we can apply any group by args without any issue.
df.set_index('Person').stack().groupby('Person')\
.agg(JobDetails=('unique'), Job=('count'))
JobDetails Job
Person
Cathy [j1, j3, j5] 4
Mark [j2] 1
Steve [j4] 1
Upvotes: 1
Reputation: 323396
Adding unique
chain with dropna
df.groupby('Person').agg(CountJob=('Job','count'),
JobDetails=('Job',lambda x: ';'.join(x.dropna().unique()))
)
Out[143]:
CountJob JobDetails
Person
Cathy 4 j1;j3;j5
Mark 1 j2
Steve 1 j4
Upvotes: 3