Pandas Groupby: How to get distinct Column Values

Question

Trying to get distinct values for columns after grouping; but am getting a total sum groupby is dropping object distinction and losing leading zeros for column

df = pd.read_csv("trial.txt",sep='|',converters={'zip':str},keep_default_na=True,low_memory=False)

Data:

Emp State   Zip      Jan feb mar 

Int  NY    11111      1   0   1

int  NY    11111      1   1   0

int  NC    09999      2   2   0

int  ON    NH443     2   2   2

after

df2 = df.groupby("Zip").count()

df2 for zip my output for zip = 11111 i'll have the output for all 12 months show 2 2 2. Were I would expect 2 1 1 and zip 09999 shows as 9999.

How what is wrong about the grouping to not get distinct column values. Have account for non-null values (there are no nulls). Column value is only (0 , 1 ,2)

Alexander · Accepted Answer

count returns the count of each group, excluding missing values. That means that a value of zero would also be included in the count. To only count positive values, you can apply a lambda function that sums the count of values greater than zero.

>>> df.groupby('Zip')[['Jan', 'feb', 'mar']].apply(lambda x: x.gt(0).sum())
       Jan  feb  mar
Zip                 
09999    1    1    0
11111    2    1    1
NH443    1    1    1

Pandas Groupby: How to get distinct Column Values

Answers (2)

Related Questions