Getting unique values and casting to a string

Question

How would I get the unique non-null values for the below data frame and cast it to a string? For example:

import pandas as pd
df=pd.DataFrame([{'id': 1, 'language': 'en'}, {'id': 1}, {'id': 1, 'language': 'fr'}, {'id': 1, 'language': 'en'}])

I want to get:

       subs
1      'en,fr'

Currently I have something like:

summary_df = df.groupby(['field1', 'field2']).agg(
    subs                =('language', 'unique'),
).reset_index()

But it seems this has three problems:

It includes nulls
I cannot save this to sql since it returns an array (I guess I need a string for that)
I also want it sorted

Here is what I'm currently doing. Is this approach good? bad? any places to improve?

subs =('burned_in_sub_language', lambda x: str(sorted(x.dropna().unique())))

DYZ · Accepted Answer

Clean and sort.
Group and select.
Collect unique labels and convert them to a string.

Rename the column, if needed.

df.dropna().sort_values('language')\
        .groupby('id')['language']\
        .unique().str.join(',')\
        .reset_index().rename(columns={'language': 'subs'})
#   id   subs
#0   1  en,fr

Getting unique values and casting to a string

Answers (2)

Related Questions