Reputation: 103
How can I combine rows in a dataframe as below in r based on the max value of Sum column to be considered as as the status column while summarising other rows as sum.
So for the input as this:
score1 score2 score3 sum Status
John 1 1 0 2 A
John 0 3 0 3 B
Smith 0 1 3 4 A
Sean 1 2 1 4 A
Sean 1 0 2 3 B
Sean 5 1 1 7 C
Carl 0 1 1 2 A
I expect to have this output:
Name score1 score2 score3 sum Status
John 1 4 0 5 B
Smith 0 1 3 4 A
Sean 7 3 4 14 C
Carl 0 1 1 2 A
Upvotes: 0
Views: 44
Reputation: 388907
We can calculate the sum
and get the corresponding Status
of max sum
for each Name
.
library(dplyr)
df %>%
group_by(Name) %>%
summarise(Sum = sum(sum), Status = Status[which.max(sum)])
# Name Sum Status
# <fct> <int> <fct>
#1 Carl 2 A
#2 John 5 B
#3 Sean 14 C
#4 Smith 4 A
Or using the same logic with data.table
library(data.table)
setDT(df)[, .(Sum = sum(sum), Status = Status[which.max(sum)]), Name]
data
df <- structure(list(Name = structure(c(2L, 2L, 4L, 3L, 3L, 3L, 1L),
.Label = c("Carl","John", "Sean", "Smith"), class = "factor"), score1 = c(1L, 0L,
0L, 1L, 1L, 5L, 0L), score2 = c(1L, 3L, 1L, 2L, 0L, 1L, 1L),
score3 = c(0L, 0L, 3L, 1L, 2L, 1L, 1L), sum = c(2L, 3L, 4L,
4L, 3L, 7L, 2L), Status = structure(c(1L, 2L, 1L, 1L, 2L,
3L, 1L), .Label = c("A", "B", "C"), class = "factor")), class = "data.frame",
row.names = c(NA, -7L))
Upvotes: 1