Nick
Nick

Reputation: 45

How to assign each instance of a factor a specific value?

Say I have a data frame that looks like this:

 playerID    yearID salary
1 abbotje01   1998 175000
2 abbotje01   1999 255000
3 abbotje01   2000 255000
4 abbotje01   2001 300000
5 abbotku01   1993 109000
6 abbotku01   1994 109000
.
.
.

How can I get a data frame that assigns each unique playerID the salary from the most recent year, like this:

 playerID    yearID salary
1 abbotje01   1998 300000
2 abbotje01   1999 300000
3 abbotje01   2000 300000
4 abbotje01   2001 300000
5 abbotku01   1993 109000
6 abbotku01   1994 109000

I want to keep each instance of the playerID but just reassign each one with the same salary

Upvotes: 0

Views: 33

Answers (2)

akrun
akrun

Reputation: 887048

After grouping by 'playerID', get the index of max value of 'yearID' to extract the 'salary' corresponding to it and update the 'salary' column with mutate

library(dplyr)
df1 %>%
     group_by(playerID) %>%
      mutate(salary = salary[which.max(yearID)])
# A tibble: 6 x 3
# Groups:   playerID [2]
#  playerID  yearID salary
#  <chr>      <int>  <int>
#1 abbotje01   1998 300000
#2 abbotje01   1999 300000
#3 abbotje01   2000 300000
#4 abbotje01   2001 300000
#5 abbotku01   1993 109000
#6 abbotku01   1994 109000

Or using data.table

library(data.table)
setDT(df1)[, salary := salary[which.max(yearID)], playerID]

data

df1 <- structure(list(playerID = c("abbotje01", "abbotje01", "abbotje01", 
"abbotje01", "abbotku01", "abbotku01"), yearID = c(1998L, 1999L, 
2000L, 2001L, 1993L, 1994L), salary = c(175000L, 255000L, 255000L, 
300000L, 109000L, 109000L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388907

We can order the data frame based on yearID and then extract the last salary from each group.

This can be done in base R

df <- df[with(df, order(playerID, yearID)), ]
df$final_salary <- with(df, ave(salary, playerID, FUN = function(x) x[length(x)]))
#Also
#df$final_salary <- with(df, ave(salary, playerID, FUN = function(x) tail(x, 1)))

df

#   playerID yearID salary final_salary
#1 abbotje01   1998 175000       300000
#2 abbotje01   1999 255000       300000
#3 abbotje01   2000 255000       300000
#4 abbotje01   2001 300000       300000
#5 abbotku01   1993 109000       109000
#6 abbotku01   1994 109000       109000

In dplyr

library(dplyr)
df %>%
  arrange(playerID, yearID) %>%
  group_by(playerID) %>%
  mutate(final_salary = last(salary))

and data.table

library(data.table)

setDT(df)
df[order(yearID), final_salary := last(salary), playerID]

Upvotes: 0

Related Questions