user10332687
user10332687

Reputation:

Get count unique values in a row in pandas

Suppose I have the following data frame:

0     1        2
new   NaN      NaN
new   one      one
a     b        c
NaN   NaN      NaN

How would I get the number of unique (non-NaN) values in a row, such as:

0     1        2       _num_unique_values
new   NaN      NaN     1
new   one      one     2
a     b        c       3
NaN   NaN      NaN     0

I suppose it would be something along the lines of:

df['_num_unique_values'] = len(set(df.loc.tolist())) ??

Upvotes: 6

Views: 13770

Answers (4)

Prayson W. Daniel
Prayson W. Daniel

Reputation: 15608

Just use nunique(axis=1).

import numpy as np
import pandas as pd

data={0:['new','new','a',np.nan],
     1:[np.nan,'one','b', np.nan],
     2:[np.nan,np.nan,'c',np.nan]}
df = pd.DataFrame(data)

# print(df.nunique(axis=1))

df['num_unique'] = df.nunique(axis=1)

Upvotes: 11

user12340241
user12340241

Reputation: 21

A more abstract solution:

df['num_uniq']=df.nunique(axis=1)

Upvotes: 2

cs95
cs95

Reputation: 403218

Use a list comprehension.... with set:

df['num_uniq'] = [len(set(v[pd.notna(v)].tolist())) for v in df.values]
df

     0    1    2  num_uniq
0  new  NaN  NaN         1
1  new  one  one         2
2    a    b    c         3
3  NaN  NaN  NaN         0

You could do this with stack, groupby and nunique.

# df.join(df.stack().groupby(level=0).nunique().to_frame('num_uniq'))
df['num_uniq'] = df.stack().groupby(level=0).nunique()
df

     0    1    2  num_uniq
0  new  NaN  NaN       1.0
1  new  one  one       2.0
2    a    b    c       3.0
3  NaN  NaN  NaN       NaN

Yet another option is apply and nunique:

df['num_uniq'] = df.apply(pd.Series.nunique, axis=1)
df

     0    1    2  num_uniq
0  new  NaN  NaN         1
1  new  one  one         2
2    a    b    c         3
3  NaN  NaN  NaN         0

Performance

df_ = df
df = pd.concat([df_] * 1000, ignore_index=True)

%timeit df['num_uniq'] = [len(set(v[pd.notna(v)])) for v in df.values]
%timeit df['num_uniq'] = df.stack().groupby(level=0).nunique()
%timeit df['num_uniq'] = df.apply(pd.Series.nunique, axis=1)
%timeit df['num_uniq'] = df.nunique(1)

196 ms ± 10.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
6.34 ms ± 343 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
679 ms ± 24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.21 ms ± 343 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Upvotes: 6

J...S
J...S

Reputation: 5207

It is not as fast as coldspeed's answer with set(), but you could also do

df['_num_unique_values'] = df.T.nunique()

First the transpose of df dataframe is taken with df.T and then nunique() is used to get the count of unique values excluding NaNs.

This is added as a new column to the original dataframe.

df would now be

    0   1   2   _num_unique_values
0   new nan nan 1
1   new one one 2
2   a   b   c   3
3   nan nan nan 0

Upvotes: 0

Related Questions