Toaster
Toaster

Reputation: 1971

Pandas Compute Unique Values per Column as Series

I would like to collect simple stats for each column in a Pandas DataFrame. Collecting the number of non-empty data points was no problem:

valueCountSeries = mydataframe.count()

However I would like to join this information with a series containing the number of unique values. At the moment I compute this value as follows:

header = list(mydataframe.columns.values)
unique = [(c, mydataframe[c].nunique()) for c in header]

So I have unique but not as a Pandas series.
Essentially I want a series so I can reach the next step:

df = pd.DataFrame([valueCountSeries, uniqueCountSeries])

Is there a Pandas-esque way to get unique as a Series so I can join the result with valueCountSeries in a new DataFrame?

Adapting the result from below. Given the following matrix:

    A  B  C  D
 0  4  0  3  3
 1  3  1  3  2
 2  4  0  0  nan
 3  2  1  0  1
 4  1  0  1  4

I want to compute:

   count  nunique
A      5        4
B      5        2
C      5        3
D      4        4

Thanks!

Upvotes: 0

Views: 2406

Answers (1)

unutbu
unutbu

Reputation: 879571

import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randint(5, size=(5,4)), columns=list('ABCD'))
print(df)
#    A  B  C  D
# 0  4  0  3  3
# 1  3  1  3  2
# 2  4  0  0  4
# 3  2  1  0  1
# 4  1  0  1  4
dct = {func.__name__:df.apply(func) for func in (pd.Series.nunique, pd.Series.count)}
print(pd.concat(dct, axis=1))

yields

   count  nunique
A      5        4
B      5        2
C      5        3
D      5        4

Upvotes: 6

Related Questions