Reputation: 1401
I have the dataset as below. and want to get count of C2 group by C1 as shown below.
C1 C2
a 1
a 2
a 3
a 4
a 5
b 1
b 2
b 3
b 4
b 5
b 6
c 6
c 7
Expected result below. Count of b should be the items not present in a. Same for c ie not present in a or b
a 5
b 1
c 1
is there any logic i can use in pandas to get this. other than iterating through the records
Upvotes: 1
Views: 168
Reputation: 81604
You can remove duplicates on 'C2'
column and count the values that left on 'C1'
:
print(df.drop_duplicates('C2')['C1'].value_counts())
# a 5
# b 1
# c 1
# Name: C1, dtype: int64
The output will be a series. You can use to_frame
if you want a dataframe:
df = df.drop_duplicates('C2')['C1'].value_counts().to_frame()
print(df)
# C1
# a 5
# b 1
# c 1
Caveat
value_counts
sorts the output by default. In this case it doesn't make a difference, but you can pass sort=False
so the lines (indexes) will stay in the same order as in the original dataframe.Upvotes: 2