Reputation: 183
I have a dataframe as shown below.
0 1 2
0 A B C
1 B C B
2 B D E
3 C E E
4 B F A
I need to get count of unique values from the entire dataframe, not column-wise unique values. In the above dataframe, unique values are A, B, C, D, E, F. So, the result I need is 6.
I'm achieving this using pandas squeeze, ravel and nunique functions, which converts entire dataframe into a series.
pd.Series(df.squeeze().values.ravel()).nunique(dropna=True)
Please let me know if there is any better way to achieve this.
Upvotes: 4
Views: 1189
Reputation: 6483
You can use set
, len
and flatten
too:
len(set(df.values.flatten()))
Out:
6
Timings: With a dummy dataframe with 6 unique values
#dummy data
df = pd.DataFrame({'Day':np.random.choice(['aa','bbbb','c','ddddd','EeeeE','xxx'], 10**6),'Heloo':np.random.choice(['aa','bbbb','c','ddddd','EeeeE','xxx'], 10**6)})
print(df.shape)
(1000000, 2)
%timeit len(set(df.values.flatten()))
>>>89.5 ms ± 1.56 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.unique(df.values).shape[0]
>>>1.61 s ± 25.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit len(np.unique(df))
>>>1.85 s ± 229 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Upvotes: 1
Reputation: 862661
Use numpy.unique
with length
of unique values:
out = len(np.unique(df))
6
Upvotes: 4
Reputation: 1456
Use NumPy
for this, as:
import numpy as np
print(np.unique(df.values).shape[0])
Upvotes: 4