Reputation: 1850
I have a data.table with the following info:
data.table(id = c(rep(1,5)),
year = c(rep(2015,3), rep(2016,2)),
class = c(rep("A", 3), rep("B", 2)),
origin = c("Europe", "Asia", "Africa", "Europe", "Asia"),
count = c(30299, 3, 34, 2, 800))
id year class origin count
1: 1 2015 A Europe 30299
2: 1 2015 A Asia 3
3: 1 2015 A Africa 34
4: 1 2016 B Europe 2
5: 1 2016 B Asia 800
However, fofr every id
, year
, class
only one location is admissible. Here, the first combination has three locations:
1: 1 2015 A Europe 30299
2: 1 2015 A Asia 3
3: 1 2015 A Africa 34
and the second combination has two locations:
4: 1 2016 B Europe 2
5: 1 2016 B Asia 800
I want to change the locations, such that for every id
, year
, class
combination the location with the highest count
will be used. This should result in a table like this:
id year class origin count
1: 1 2015 A Europe 30299
2: 1 2015 A Europe 3
3: 1 2015 A Europe 34
4: 1 2016 B Asia 2
5: 1 2016 B Asia 800
How can this be achieved? I was thinking of splitting tha data table in a list fo lists and then applying lapply, but i am sure there is a better/simpßler solution? any tipps?
Upvotes: 1
Views: 30
Reputation: 78917
You can use which
also with dplyr
workflow. which
solution already posted by sindri_baldur (credits to him)
library(dplyr)
df %>%
group_by(id, year, class) %>%
mutate(origin = origin[which.max(count)])
Output:
id year class origin count
<dbl> <dbl> <chr> <chr> <dbl>
1 1 2015 A Europe 30299
2 1 2015 A Europe 3
3 1 2015 A Europe 34
4 1 2016 B Asia 2
5 1 2016 B Asia 800
Upvotes: 0
Reputation: 33498
DT[, origin := origin[which.max(count)], by = .(id, year, class)]
Upvotes: 3