David542
David542

Reputation: 110277

Getting unique values and casting to a string

How would I get the unique non-null values for the below data frame and cast it to a string? For example:

import pandas as pd
df=pd.DataFrame([{'id': 1, 'language': 'en'}, {'id': 1}, {'id': 1, 'language': 'fr'}, {'id': 1, 'language': 'en'}])

I want to get:

       subs
1      'en,fr'

Currently I have something like:

summary_df = df.groupby(['field1', 'field2']).agg(
    subs                =('language', 'unique'),
).reset_index()

But it seems this has three problems:

Here is what I'm currently doing. Is this approach good? bad? any places to improve?

subs =('burned_in_sub_language', lambda x: str(sorted(x.dropna().unique())))

Upvotes: 0

Views: 863

Answers (2)

DYZ
DYZ

Reputation: 57033

  1. Clean and sort.

  2. Group and select.

  3. Collect unique labels and convert them to a string.

  4. Rename the column, if needed.

    df.dropna().sort_values('language')\
            .groupby('id')['language']\
            .unique().str.join(',')\
            .reset_index().rename(columns={'language': 'subs'})
    #   id   subs
    #0   1  en,fr
    

Upvotes: 1

adhg
adhg

Reputation: 10863

df.dropna().groupby('id')['language'].unique().reset_index().rename(columns={'language':'subs'})

Desired result

    id  subs
0   1   [en, fr]

Upvotes: 1

Related Questions