dropping duplicates on one specific column and add a new column as count of repeat records pandas

Question

I have a pandas df like this

student_id	A	B
1	3	13
2	4	23
1	5	12
4	28	32
1	38	12
2	21	14

My desired output: I want to drop the duplicates, and count how many duplicates there are according to student_id and keeping the last record/row and append the count column as new column, also average the duplicated rows entry in A and B as new columns

student_id	A	B	count	average A rounded	average B rounded
1	38	12	3	15	12
2	21	14	2	13	19
4	28	32	1	28	32

tdy · Accepted Answer

You can use named aggregation:

df.groupby('student_id', as_index=False).agg(
    A=('A', 'last'),
    B=('B', 'last'),
    count=('student_id', 'count'),
    average_A_rounded=('A', lambda x: np.mean(x).round()),
    average_B_rounded=('B', lambda x: np.mean(x).round()),
)

#    student_id   A   B  count  average_A_rounded  average_B_rounded
# 0           1  38  12      3                 15                 12
# 1           2  21  14      2                 12                 18
# 2           4  28  32      1                 28                 32

dropping duplicates on one specific column and add a new column as count of repeat records pandas

Answers (2)

Related Questions