Nneka
Nneka

Reputation: 1850

conditionally changing contents of a columns using max() in data.table in R

I have a data.table with the following info:

   data.table(id = c(rep(1,5)),
               year = c(rep(2015,3), rep(2016,2)), 
               class = c(rep("A", 3), rep("B", 2)),
               origin = c("Europe", "Asia", "Africa", "Europe", "Asia"), 
               count = c(30299, 3, 34, 2, 800))

   id year class origin count
1:  1 2015     A Europe 30299
2:  1 2015     A   Asia     3
3:  1 2015     A Africa    34
4:  1 2016     B Europe     2
5:  1 2016     B   Asia   800

However, fofr every id, year, class only one location is admissible. Here, the first combination has three locations:

1:  1 2015     A Europe 30299
2:  1 2015     A   Asia     3
3:  1 2015     A Africa    34

and the second combination has two locations:

4:  1 2016     B Europe     2
5:  1 2016     B   Asia   800

I want to change the locations, such that for every id, year, class combination the location with the highest count will be used. This should result in a table like this:

   id year class origin count
1:  1 2015     A Europe 30299
2:  1 2015     A Europe     3
3:  1 2015     A Europe    34
4:  1 2016     B   Asia     2
5:  1 2016     B   Asia   800

How can this be achieved? I was thinking of splitting tha data table in a list fo lists and then applying lapply, but i am sure there is a better/simpßler solution? any tipps?

Upvotes: 1

Views: 30

Answers (2)

TarJae
TarJae

Reputation: 78917

You can use which also with dplyr workflow. which solution already posted by sindri_baldur (credits to him)

library(dplyr)
df %>% 
  group_by(id, year, class) %>% 
  mutate(origin = origin[which.max(count)])

Output:

     id  year class origin count
  <dbl> <dbl> <chr> <chr>  <dbl>
1     1  2015 A     Europe 30299
2     1  2015 A     Europe     3
3     1  2015 A     Europe    34
4     1  2016 B     Asia       2
5     1  2016 B     Asia     800

Upvotes: 0

s_baldur
s_baldur

Reputation: 33498

DT[, origin := origin[which.max(count)], by = .(id, year, class)]

Upvotes: 3

Related Questions