queen of spades
queen of spades

Reputation: 37

dropping duplicates on one specific column and add a new column as count of repeat records pandas

I have a pandas df like this

student_id A B
1 3 13
2 4 23
1 5 12
4 28 32
1 38 12
2 21 14

My desired output: I want to drop the duplicates, and count how many duplicates there are according to student_id and keeping the last record/row and append the count column as new column, also average the duplicated rows entry in A and B as new columns

student_id A B count average A rounded average B rounded
1 38 12 3 15 12
2 21 14 2 13 19
4 28 32 1 28 32

Upvotes: 2

Views: 58

Answers (2)

tdy
tdy

Reputation: 41327

You can use named aggregation:

df.groupby('student_id', as_index=False).agg(
    A=('A', 'last'),
    B=('B', 'last'),
    count=('student_id', 'count'),
    average_A_rounded=('A', lambda x: np.mean(x).round()),
    average_B_rounded=('B', lambda x: np.mean(x).round()),
)

#    student_id   A   B  count  average_A_rounded  average_B_rounded
# 0           1  38  12      3                 15                 12
# 1           2  21  14      2                 12                 18
# 2           4  28  32      1                 28                 32

Upvotes: 2

Andrej Kesely
Andrej Kesely

Reputation: 195468

I see that you want round the values "half-up". So to extend the @tdy answer:

def round_half_up(x):
    mask = x >= 0
    out = np.empty_like(x)
    out[mask] = np.floor(x[mask] + 0.5)
    out[~mask] = np.ceil(x[~mask] - 0.5)
    return out


df = df.groupby("student_id", as_index=False).agg(
    A=("A", "last"),
    B=("B", "last"),
    count=("A", "count"),
    average_A_rounded=("A", "mean"),
    average_B_rounded=("B", "mean"),
)

print(df.apply(round_half_up).astype(int))

Prints:

   student_id   A   B  count  average_A_rounded  average_B_rounded
0           1  38  12      3                 15                 12
1           2  21  14      2                 13                 19
2           4  28  32      1                 28                 32

Upvotes: 1

Related Questions