Rookatu
Rookatu

Reputation: 1507

For each dataframe row, add x rows where x is obtained from that row

I've got a data.frame to which I need to add rows, but the number of rows to add (and their content) is determined from the existing rows of the data.frame. I'd also like to end up with a column enumerating the rows of each duplicated group. Here is example data:

> A <- data.frame(veh = c("MINIVAN","HEAVY TRUCK"),age = c(2.5,3.5),rows_to_add = c(2,3))
> A
          veh age rows_to_add
1     MINIVAN 2.5           2
2 HEAVY TRUCK 3.5           3

and the desired output:

> B <- rbind(do.call("rbind",replicate(n=unique(A[1,"rows_to_add"])+1,A[1,],simplify = FALSE)),
+ do.call("rbind",replicate(n=unique(A[2,"rows_to_add"])+1,A[2,],simplify = FALSE)))
> B <- cbind(B,enum = c(0:2,0:3))
> B
           veh age rows_to_add enum
1      MINIVAN 2.5           2    0
2      MINIVAN 2.5           2    1
3      MINIVAN 2.5           2    2
24 HEAVY TRUCK 3.5           3    0
21 HEAVY TRUCK 3.5           3    1
22 HEAVY TRUCK 3.5           3    2
23 HEAVY TRUCK 3.5           3    3

Obviously the code I've used here to generate the output is messy, non-scalable, and possibly inefficient. I'm looking for a general solution that would allow me to do this with a larger data.frame with reasonable speed, and avoiding loops (attempting to speed up loop laden code is part of the impetus for this question).

This question deals with a weaker version of the problem wherein the number or rows to add is does not vary with the rows of the data itself, and the rows to insert can contain NAs, but I saw no way to generalize the answer there.

How can I achieve the desired output in general?

Upvotes: 1

Views: 77

Answers (2)

markus
markus

Reputation: 26343

A base R approach

out <- A[rep(seq_len(nrow(A)), A$rows_to_add + 1), ]
out
#            veh age rows_to_add
#1       MINIVAN 2.5           2
#1.1     MINIVAN 2.5           2
#1.2     MINIVAN 2.5           2
#2   HEAVY TRUCK 3.5           3
#2.1 HEAVY TRUCK 3.5           3
#2.2 HEAVY TRUCK 3.5           3
#2.3 HEAVY TRUCK 3.5           3

Add the new column the way @thelatemail suggested in the comments

out$enum <- sequence(unique(A$rows_to_add) + 1) - 1
#out <- transform(out, enum = ave(age, rows_to_add, FUN = seq_along) - 1) # my slower attempt
#            veh age rows_to_add enum
#1       MINIVAN 2.5           2    0
#1.1     MINIVAN 2.5           2    1
#1.2     MINIVAN 2.5           2    2
#2   HEAVY TRUCK 3.5           3    0
#2.1 HEAVY TRUCK 3.5           3    1
#2.2 HEAVY TRUCK 3.5           3    2
#2.3 HEAVY TRUCK 3.5           3    3

A potentially faster alternative with data.table

library(data.table)
setDT(A)
out <- A[rep(seq_len(dim(A)[1]), A[, rows_to_add] + 1)
         ][, enum := sequence(unique(rows_to_add) + 1) - 1]
out

Upvotes: 2

Shree
Shree

Reputation: 11150

You need uncount from tidyr -

library(dplyr)
library(tidyr)

A %>% 
  uncount(weights = rows_to_add + 1, .id = "enum") %>%
  mutate(
    enum = enum - 1
  )

          veh age rows_to_add enum
1     MINIVAN 2.5           2    0
2     MINIVAN 2.5           2    1
3     MINIVAN 2.5           2    2
4 HEAVY TRUCK 3.5           3    0
5 HEAVY TRUCK 3.5           3    1
6 HEAVY TRUCK 3.5           3    2
7 HEAVY TRUCK 3.5           3    3

Upvotes: 0

Related Questions