Bilal
Bilal

Reputation: 65

counting unique elements in lists

I have a dataframe containing one column of lists.

names                                       unique_values
[B-PER,I-PER,I-PER,B-PER]                        2
[I-PER,N-PER,B-PER,I-PER,A-PER]                  4
[B-PER,A-PER,I-PER]                              3
[B-PER, A-PER,A-PER,A-PER]                       2

I have to count each distinct value in a column of lists and If value appears more than once count it as one. How can I achieve it

Thanks

Upvotes: 0

Views: 135

Answers (3)

Karan Shishoo
Karan Shishoo

Reputation: 2857

You can use the inbulit set data type to do this -

df['unique_values'] = df['names'].apply(lambda a : len(set(a)))

This works as sets do not allow any duplicate elements in their construction so when you convert a list to a set it strips all duplicate elements and all you need to do is get the length of the resultant set.

to ignore NaN values in a list you can do the following -

df['unique_values'] = df['names'].apply(lambda a : len([x for x in set(a) if str(x) != 'nan'])) 

Upvotes: 1

BENY
BENY

Reputation: 323386

Combine explode with nunique

df["unique_values"] = df.names.explode().groupby(level = 0).nunique()

Upvotes: 2

Pablo C
Pablo C

Reputation: 4771

Try:

df["unique_values"] = df.names.explode().groupby(level = 0).unique().str.len()

Output

df
                                 names  unique_values
0         [B-PER, I-PER, I-PER, B-PER]              2
1  [I-PER, N-PER, B-PER, I-PER, A-PER]              4
2                [B-PER, A-PER, I-PER]              3
3         [B-PER, A-PER, A-PER, A-PER]              2

Upvotes: 0

Related Questions