Reputation: 37
I have a pandas df like this
student_id | A | B |
---|---|---|
1 | 3 | 13 |
2 | 4 | 23 |
1 | 5 | 12 |
4 | 28 | 32 |
1 | 38 | 12 |
2 | 21 | 14 |
My desired output: I want to drop the duplicates, and count how many duplicates there are according to student_id and keeping the last record/row and append the count column as new column, also average the duplicated rows entry in A and B as new columns
student_id | A | B | count | average A rounded | average B rounded |
---|---|---|---|---|---|
1 | 38 | 12 | 3 | 15 | 12 |
2 | 21 | 14 | 2 | 13 | 19 |
4 | 28 | 32 | 1 | 28 | 32 |
Upvotes: 2
Views: 58
Reputation: 41327
You can use named aggregation:
df.groupby('student_id', as_index=False).agg(
A=('A', 'last'),
B=('B', 'last'),
count=('student_id', 'count'),
average_A_rounded=('A', lambda x: np.mean(x).round()),
average_B_rounded=('B', lambda x: np.mean(x).round()),
)
# student_id A B count average_A_rounded average_B_rounded
# 0 1 38 12 3 15 12
# 1 2 21 14 2 12 18
# 2 4 28 32 1 28 32
Upvotes: 2
Reputation: 195468
I see that you want round the values "half-up". So to extend the @tdy answer:
def round_half_up(x):
mask = x >= 0
out = np.empty_like(x)
out[mask] = np.floor(x[mask] + 0.5)
out[~mask] = np.ceil(x[~mask] - 0.5)
return out
df = df.groupby("student_id", as_index=False).agg(
A=("A", "last"),
B=("B", "last"),
count=("A", "count"),
average_A_rounded=("A", "mean"),
average_B_rounded=("B", "mean"),
)
print(df.apply(round_half_up).astype(int))
Prints:
student_id A B count average_A_rounded average_B_rounded
0 1 38 12 3 15 12
1 2 21 14 2 13 19
2 4 28 32 1 28 32
Upvotes: 1