Incremental count in pandas

Question

I have the dataset as below. and want to get count of C2 group by C1 as shown below.

Expected result below. Count of b should be the items not present in a. Same for c ie not present in a or b

a   5
b   1
c   1

is there any logic i can use in pandas to get this. other than iterating through the records

DeepSpace · Accepted Answer

You can remove duplicates on 'C2' column and count the values that left on 'C1':

print(df.drop_duplicates('C2')['C1'].value_counts())
#  a    5
#  b    1
#  c    1
#  Name: C1, dtype: int64

The output will be a series. You can use to_frame if you want a dataframe:

df = df.drop_duplicates('C2')['C1'].value_counts().to_frame()
print(df)
#     C1
#  a   5
#  b   1
#  c   1

Caveat

value_counts sorts the output by default. In this case it doesn't make a difference, but you can pass sort=False so the lines (indexes) will stay in the same order as in the original dataframe.

Answers (1)