Reputation: 110277
How would I get the unique non-null values for the below data frame and cast it to a string? For example:
import pandas as pd
df=pd.DataFrame([{'id': 1, 'language': 'en'}, {'id': 1}, {'id': 1, 'language': 'fr'}, {'id': 1, 'language': 'en'}])
I want to get:
subs
1 'en,fr'
Currently I have something like:
summary_df = df.groupby(['field1', 'field2']).agg(
subs =('language', 'unique'),
).reset_index()
But it seems this has three problems:
Here is what I'm currently doing. Is this approach good? bad? any places to improve?
subs =('burned_in_sub_language', lambda x: str(sorted(x.dropna().unique())))
Upvotes: 0
Views: 863
Reputation: 57033
Clean and sort.
Group and select.
Collect unique labels and convert them to a string.
Rename the column, if needed.
df.dropna().sort_values('language')\
.groupby('id')['language']\
.unique().str.join(',')\
.reset_index().rename(columns={'language': 'subs'})
# id subs
#0 1 en,fr
Upvotes: 1
Reputation: 10863
df.dropna().groupby('id')['language'].unique().reset_index().rename(columns={'language':'subs'})
Desired result
id subs
0 1 [en, fr]
Upvotes: 1