Reputation: 1507
I've got a data.frame
to which I need to add rows, but the number of rows to add (and their content) is determined from the existing rows of the data.frame
. I'd also like to end up with a column enumerating the rows of each duplicated group. Here is example data:
> A <- data.frame(veh = c("MINIVAN","HEAVY TRUCK"),age = c(2.5,3.5),rows_to_add = c(2,3))
> A
veh age rows_to_add
1 MINIVAN 2.5 2
2 HEAVY TRUCK 3.5 3
and the desired output:
> B <- rbind(do.call("rbind",replicate(n=unique(A[1,"rows_to_add"])+1,A[1,],simplify = FALSE)),
+ do.call("rbind",replicate(n=unique(A[2,"rows_to_add"])+1,A[2,],simplify = FALSE)))
> B <- cbind(B,enum = c(0:2,0:3))
> B
veh age rows_to_add enum
1 MINIVAN 2.5 2 0
2 MINIVAN 2.5 2 1
3 MINIVAN 2.5 2 2
24 HEAVY TRUCK 3.5 3 0
21 HEAVY TRUCK 3.5 3 1
22 HEAVY TRUCK 3.5 3 2
23 HEAVY TRUCK 3.5 3 3
Obviously the code I've used here to generate the output is messy, non-scalable, and possibly inefficient. I'm looking for a general solution that would allow me to do this with a larger data.frame
with reasonable speed, and avoiding loops (attempting to speed up loop laden code is part of the impetus for this question).
This question deals with a weaker version of the problem wherein the number or rows to add is does not vary with the rows of the data itself, and the rows to insert can contain NA
s, but I saw no way to generalize the answer there.
How can I achieve the desired output in general?
Upvotes: 1
Views: 77
Reputation: 26343
A base R
approach
out <- A[rep(seq_len(nrow(A)), A$rows_to_add + 1), ]
out
# veh age rows_to_add
#1 MINIVAN 2.5 2
#1.1 MINIVAN 2.5 2
#1.2 MINIVAN 2.5 2
#2 HEAVY TRUCK 3.5 3
#2.1 HEAVY TRUCK 3.5 3
#2.2 HEAVY TRUCK 3.5 3
#2.3 HEAVY TRUCK 3.5 3
Add the new column the way @thelatemail suggested in the comments
out$enum <- sequence(unique(A$rows_to_add) + 1) - 1
#out <- transform(out, enum = ave(age, rows_to_add, FUN = seq_along) - 1) # my slower attempt
# veh age rows_to_add enum
#1 MINIVAN 2.5 2 0
#1.1 MINIVAN 2.5 2 1
#1.2 MINIVAN 2.5 2 2
#2 HEAVY TRUCK 3.5 3 0
#2.1 HEAVY TRUCK 3.5 3 1
#2.2 HEAVY TRUCK 3.5 3 2
#2.3 HEAVY TRUCK 3.5 3 3
A potentially faster alternative with data.table
library(data.table)
setDT(A)
out <- A[rep(seq_len(dim(A)[1]), A[, rows_to_add] + 1)
][, enum := sequence(unique(rows_to_add) + 1) - 1]
out
Upvotes: 2
Reputation: 11150
You need uncount
from tidyr
-
library(dplyr)
library(tidyr)
A %>%
uncount(weights = rows_to_add + 1, .id = "enum") %>%
mutate(
enum = enum - 1
)
veh age rows_to_add enum
1 MINIVAN 2.5 2 0
2 MINIVAN 2.5 2 1
3 MINIVAN 2.5 2 2
4 HEAVY TRUCK 3.5 3 0
5 HEAVY TRUCK 3.5 3 1
6 HEAVY TRUCK 3.5 3 2
7 HEAVY TRUCK 3.5 3 3
Upvotes: 0