Reputation: 1971
I would like to collect simple stats for each column in a Pandas DataFrame. Collecting the number of non-empty data points was no problem:
valueCountSeries = mydataframe.count()
However I would like to join this information with a series containing the number of unique values. At the moment I compute this value as follows:
header = list(mydataframe.columns.values)
unique = [(c, mydataframe[c].nunique()) for c in header]
So I have unique but not as a Pandas series.
Essentially I want a series so I can reach the next step:
df = pd.DataFrame([valueCountSeries, uniqueCountSeries])
Is there a Pandas-esque way to get unique as a Series so I can join the result with valueCountSeries in a new DataFrame?
Adapting the result from below. Given the following matrix:
A B C D
0 4 0 3 3
1 3 1 3 2
2 4 0 0 nan
3 2 1 0 1
4 1 0 1 4
I want to compute:
count nunique
A 5 4
B 5 2
C 5 3
D 4 4
Thanks!
Upvotes: 0
Views: 2406
Reputation: 879571
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.randint(5, size=(5,4)), columns=list('ABCD'))
print(df)
# A B C D
# 0 4 0 3 3
# 1 3 1 3 2
# 2 4 0 0 4
# 3 2 1 0 1
# 4 1 0 1 4
dct = {func.__name__:df.apply(func) for func in (pd.Series.nunique, pd.Series.count)}
print(pd.concat(dct, axis=1))
yields
count nunique
A 5 4
B 5 2
C 5 3
D 5 4
Upvotes: 6