Reputation: 2309
So I am trying to translate some dplyr code. I have tried to get help from a package that translates dplyr to data.table but it still does not work. The error is with row_number
from dplyr
..
I need all the steps in the dplyr
code (even though they don't make sense here with mtcars
)
library(dplyr)
library(dtplyr) # from https://github.com/tidyverse/dtplyr
library(data.table)
mtcars %>%
distinct(mpg, .keep_all = TRUE) %>%
group_by(am) %>%
arrange(mpg, .by_group = TRUE) %>%
mutate(row_num = LETTERS[row_number()]) %>%
ungroup()
# using dtplyr
dt <- lazy_dt(mtcars)
dt %>%
distinct(mpg, .keep_all = TRUE) %>%
group_by(am) %>%
arrange(mpg, .by_group = TRUE) %>%
mutate(row_num = LETTERS[row_number()]) %>%
ungroup() %>%
show_query()
#> unique(`_DT1`, by = "mpg")[order(am, mpg)][, `:=`(row_num = c("A",
#> "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N",
#> "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z")[row_number()]),
#> keyby = .(am)]
# I then use the query from dtplyr
DT <- as.data.table(mtcars)
unique(DT, by = "mpg")[order(am, mpg)][, `:=`(row_num = c("A",
"B", "C", "D", "E", "F", "G",
"H", "I", "J", "K", "L", "M",
"N", "O", "P", "Q", "R", "S",
"T", "U", "V", "W", "X", "Y",
"Z")[row_number()]), keyby = .(am)]
#> row_number() should only be called in a data context
Created on 2019-07-14 by the reprex package (v0.3.0)
Upvotes: 9
Views: 1523
Reputation: 7714
As the data.table
syntax is heavily critizised below are two versions of akrun answer with IMHO a clearer syntax.
I find much harder to follow data.table
code when it's piped with [
multiple times, specially when there is a :=
call (mutate
in dplyr).
library(data.table)
dt = as.data.table(mtcars)
dt = unique(dt, by = "mpg")
dt = dt[order(am, mpg)]
dt[, row_num:=LETTERS[seq_len(.N)], by=am]
dt[1:3]
mpg cyl disp hp drat wt qsec vs am gear carb row_num
1: 10.4 8 472 205 2.93 5.25 17.98 0 0 3 4 A
2: 13.3 8 350 245 3.73 3.84 15.41 0 0 3 4 B
3: 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4 C
Another option is using the %>%
pipe.
library(magrittr)
dt = as.data.table(mtcars)
dt = unique(dt, by = "mpg") %>%
.[order(am, mpg)] %>%
.[, row_num:=LETTERS[seq_len(.N)], by=am]
dt[1:3]
# mpg cyl disp hp drat wt qsec vs am gear carb row_num
# 1: 10.4 8 472 205 2.93 5.25 17.98 0 0 3 4 A
# 2: 13.3 8 350 245 3.73 3.84 15.41 0 0 3 4 B
# 3: 14.3 8 360 245 3.21 3.57 15.84 0 0 3 4 C
Upvotes: 1
Reputation: 103898
I'm experimenting with some tweaks to the translation so that dtplyr will automatically produce something more like what you want:
library(dtplyr)
library(dplyr, warn.conflicts = FALSE)
dt <- lazy_dt(mtcars)
dt %>%
distinct(mpg, .keep_all = TRUE) %>%
group_by(am) %>%
arrange(mpg, .by_group = TRUE) %>%
mutate(row_num = LETTERS[row_number()]) %>%
ungroup() %>%
show_query()
#> unique(`_DT1`, by = "mpg")[order(am, mpg)][, `:=`(row_num = ..LETTERS[seq_len(.N)]),
#> keyby = .(am)]
Or avoiding the grouping as @MichaelChirico suggests:
dt %>%
distinct(mpg, .keep_all = TRUE) %>%
arrange(am, mpg) %>%
mutate(row_num = LETTERS[row_number(am)]) %>%
ungroup() %>%
show_query()
#> unique(`_DT1`, by = "mpg")[order(am, mpg)][, `:=`(row_num = ..LETTERS[frank(am,
#> ties.method = "first", na.last = "keep")])]
(Using the ..
in front of LETTERS
is a data.table feature that makes it clear that you're referring to a variable outside of the data frame; it's probably not necessary here but I think it's better to be safe than sorry.)
Upvotes: 8
Reputation: 34703
Might I recommend the rowid function? It does the grouping step "under the hood" you might find it looks cleaner:
unique(DT, by='mpg')[order(am, mpg), row_num := LETTERS[rowid(am)]]
if you love chaining, you could also get everything inside []
:
DT[ , .SD[1L], by = mpg
][order(am, mpg), row_num := LETTERS[rowid(am)]]
Upvotes: 8
Reputation: 886938
We can use seq_len(.N)
unique(DT, by = "mpg")[order(am, mpg)][,
`:=`(row_num = LETTERS[seq_len(.N)]), by = .(am)][]
Upvotes: 5