plausibly_exogenous
plausibly_exogenous

Reputation: 512

Efficiently Repeating Observations by Group

I am trying to find an efficient way to repeat rows by group in data.table for only some groups. Please consider the following example:

library(data.table) 

DT <- data.table(x = c("A","A", "B", "B", "C","C", "D","D"),
                 y = 1:8)

This dataset looks like:

head(DT)
   x y
1: A 1
2: A 2
3: B 3
4: B 4
5: C 5
6: C 6

Say I have a separate vector rep <- c("A", "A", "A", "B", "B", "C"). Given this vector, I want to be able to repeat all rows of A three times (due to the cardinality of the "A" characters in rep) and all rows associated with B two times. Thus, the final dataset should like:

    x y
1:  A 1
2:  A 2
3:  A 1
4:  A 2
5:  A 1
6:  A 2
7:  B 3
8:  B 4
9:  B 3
10: B 4
11: C 5
12: C 6

Notice that I did not repeat "C" because the cardinality of "C" is only 1 in rep. I have a hackish way of doing this procedure at the moment, but I'm wondering if there was a more efficient data.table way of doing the above.

Thank you!

P.S. The reason I am doing this is because I am doing some matching with replacement in my regressions and sometimes, the same control firm is assigned to more than one treatment firm.

Upvotes: 1

Views: 223

Answers (3)

akrun
akrun

Reputation: 887911

We can do

DT[ data.table(x = v1)[, .N, x], on = .(x)][rep(seq_len(.N), N)]

Or to return in the same order

 DT[, .(y = list(y)), x][data.table(x = v1), on = .(x)][, .(x, y = unlist(y))]

data

v1 <- c("A", "A", "A", "B", "B", "C")

Upvotes: 0

IRTFM
IRTFM

Reputation: 263481

A data.table merge won't give you the same ordering but you aren't supposed to rely on ordering in datatables, anyway:

merge(DT, data.frame(x=rep), by="x")

    x y
 1: A 1
 2: A 1
 3: A 1
 4: A 2
 5: A 2
 6: A 2
 7: B 3
 8: B 3
 9: B 4
10: B 4
11: C 5
12: C 6

Upvotes: 3

Carey Caginalp
Carey Caginalp

Reputation: 432

One solution is to gather up the counts and left join onto them:

library(data.table) 
library(data.table)

DT <- data.table(x = c("A","A", "B", "B", "C","C", "D","D"),
                 y = 1:8)
rep_vec <- c("A", "A", "A", "B", "B", "C")
rep_DT <- DT %>% 
  left_join(data.frame(group = rep_vec), by = c("x" = "group"))

Are you sure duplicating rows in a dataframe is your ideal option though?

Upvotes: 1

Related Questions