whytheq
whytheq

Reputation: 35557

Summary data for pandas dataframe

Describe() doesn't do exactly what I'd like - so I'm rolling my own version.

The following works fine apart from the final metric 'Num Unique Values' which is returning numbers but they are not correct - I guess I'm using apply incorrectly?

pd.DataFrame({
        'Max':d.max(), 
        'Min':d.min(), 
        'Count':d.count(axis = 0),
        'Count Null':d.isnull().sum(),
        'Count Zero':d[d==0].count(),
        'Num Unique Values':d.apply(lambda x: x.nunique())
    }) 

Upvotes: 1

Views: 55

Answers (1)

jezrael
jezrael

Reputation: 862431

For me it works nice:

print(df.apply(lambda x: x.nunique()))

Sample:

df = pd.DataFrame({'A':[1,2,2,1],
                   'B':[4,5,6,4],
                   'C':[7,8,9,1],
                   'D':[1,3,5,9]})

print (df)
   A  B  C  D
0  1  4  7  1
1  2  5  8  3
2  2  6  9  5
3  1  4  1  9

print (df.apply(lambda x: x.nunique()))
A    2
B    3
C    4
D    4
dtype: int64

Another solution:

print (df.apply(lambda x: len(x.unique())))
A    2
B    3
C    4
D    4
dtype: int64

Upvotes: 1

Related Questions