Reputation: 65
I have a dataframe containing one column of lists.
names unique_values
[B-PER,I-PER,I-PER,B-PER] 2
[I-PER,N-PER,B-PER,I-PER,A-PER] 4
[B-PER,A-PER,I-PER] 3
[B-PER, A-PER,A-PER,A-PER] 2
I have to count each distinct value in a column of lists and If value appears more than once count it as one. How can I achieve it
Thanks
Upvotes: 0
Views: 135
Reputation: 2857
You can use the inbulit set
data type to do this -
df['unique_values'] = df['names'].apply(lambda a : len(set(a)))
This works as sets do not allow any duplicate elements in their construction so when you convert a list to a set it strips all duplicate elements and all you need to do is get the length of the resultant set.
to ignore NaN values in a list you can do the following -
df['unique_values'] = df['names'].apply(lambda a : len([x for x in set(a) if str(x) != 'nan']))
Upvotes: 1
Reputation: 323386
Combine explode
with nunique
df["unique_values"] = df.names.explode().groupby(level = 0).nunique()
Upvotes: 2
Reputation: 4771
Try:
df["unique_values"] = df.names.explode().groupby(level = 0).unique().str.len()
Output
df
names unique_values
0 [B-PER, I-PER, I-PER, B-PER] 2
1 [I-PER, N-PER, B-PER, I-PER, A-PER] 4
2 [B-PER, A-PER, I-PER] 3
3 [B-PER, A-PER, A-PER, A-PER] 2
Upvotes: 0