Reputation: 512
I am trying to find an efficient way to repeat rows by group in data.table for only some groups. Please consider the following example:
library(data.table)
DT <- data.table(x = c("A","A", "B", "B", "C","C", "D","D"),
y = 1:8)
This dataset looks like:
head(DT)
x y
1: A 1
2: A 2
3: B 3
4: B 4
5: C 5
6: C 6
Say I have a separate vector rep <- c("A", "A", "A", "B", "B", "C")
. Given this vector, I want to be able to repeat all rows of A three times (due to the cardinality of the "A" characters in rep) and all rows associated with B two times. Thus, the final dataset should like:
x y
1: A 1
2: A 2
3: A 1
4: A 2
5: A 1
6: A 2
7: B 3
8: B 4
9: B 3
10: B 4
11: C 5
12: C 6
Notice that I did not repeat "C" because the cardinality of "C" is only 1 in rep
. I have a hackish way of doing this procedure at the moment, but I'm wondering if there was a more efficient data.table way of doing the above.
Thank you!
P.S. The reason I am doing this is because I am doing some matching with replacement in my regressions and sometimes, the same control firm is assigned to more than one treatment firm.
Upvotes: 1
Views: 223
Reputation: 887911
We can do
DT[ data.table(x = v1)[, .N, x], on = .(x)][rep(seq_len(.N), N)]
Or to return in the same order
DT[, .(y = list(y)), x][data.table(x = v1), on = .(x)][, .(x, y = unlist(y))]
v1 <- c("A", "A", "A", "B", "B", "C")
Upvotes: 0
Reputation: 263481
A data.table merge won't give you the same ordering but you aren't supposed to rely on ordering in datatables, anyway:
merge(DT, data.frame(x=rep), by="x")
x y
1: A 1
2: A 1
3: A 1
4: A 2
5: A 2
6: A 2
7: B 3
8: B 3
9: B 4
10: B 4
11: C 5
12: C 6
Upvotes: 3
Reputation: 432
One solution is to gather up the counts and left join onto them:
library(data.table)
library(data.table)
DT <- data.table(x = c("A","A", "B", "B", "C","C", "D","D"),
y = 1:8)
rep_vec <- c("A", "A", "A", "B", "B", "C")
rep_DT <- DT %>%
left_join(data.frame(group = rep_vec), by = c("x" = "group"))
Are you sure duplicating rows in a dataframe is your ideal option though?
Upvotes: 1