Reputation: 45
Say I have a data frame that looks like this:
playerID yearID salary
1 abbotje01 1998 175000
2 abbotje01 1999 255000
3 abbotje01 2000 255000
4 abbotje01 2001 300000
5 abbotku01 1993 109000
6 abbotku01 1994 109000
.
.
.
How can I get a data frame that assigns each unique playerID the salary from the most recent year, like this:
playerID yearID salary
1 abbotje01 1998 300000
2 abbotje01 1999 300000
3 abbotje01 2000 300000
4 abbotje01 2001 300000
5 abbotku01 1993 109000
6 abbotku01 1994 109000
I want to keep each instance of the playerID but just reassign each one with the same salary
Upvotes: 0
Views: 33
Reputation: 887048
After grouping by 'playerID', get the index of max
value of 'yearID' to extract the 'salary' corresponding to it and update the 'salary' column with mutate
library(dplyr)
df1 %>%
group_by(playerID) %>%
mutate(salary = salary[which.max(yearID)])
# A tibble: 6 x 3
# Groups: playerID [2]
# playerID yearID salary
# <chr> <int> <int>
#1 abbotje01 1998 300000
#2 abbotje01 1999 300000
#3 abbotje01 2000 300000
#4 abbotje01 2001 300000
#5 abbotku01 1993 109000
#6 abbotku01 1994 109000
Or using data.table
library(data.table)
setDT(df1)[, salary := salary[which.max(yearID)], playerID]
df1 <- structure(list(playerID = c("abbotje01", "abbotje01", "abbotje01",
"abbotje01", "abbotku01", "abbotku01"), yearID = c(1998L, 1999L,
2000L, 2001L, 1993L, 1994L), salary = c(175000L, 255000L, 255000L,
300000L, 109000L, 109000L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))
Upvotes: 1
Reputation: 388907
We can order
the data frame based on yearID
and then extract the last salary
from each group.
This can be done in base R
df <- df[with(df, order(playerID, yearID)), ]
df$final_salary <- with(df, ave(salary, playerID, FUN = function(x) x[length(x)]))
#Also
#df$final_salary <- with(df, ave(salary, playerID, FUN = function(x) tail(x, 1)))
df
# playerID yearID salary final_salary
#1 abbotje01 1998 175000 300000
#2 abbotje01 1999 255000 300000
#3 abbotje01 2000 255000 300000
#4 abbotje01 2001 300000 300000
#5 abbotku01 1993 109000 109000
#6 abbotku01 1994 109000 109000
In dplyr
library(dplyr)
df %>%
arrange(playerID, yearID) %>%
group_by(playerID) %>%
mutate(final_salary = last(salary))
and data.table
library(data.table)
setDT(df)
df[order(yearID), final_salary := last(salary), playerID]
Upvotes: 0