Reputation: 503
I would like to duplicate rows by column Count
. For sample data, my code works fine, but when I try to use a large data set, I get the error:
Error in rep(seq_len(dim(df1)[1]), df1$Count) : invalid 'times' argument
My data & code:
df1 <- data.frame(Month = rep(month.abb[1:12],10,replace = TRUE), Product = paste0('Product ', rep(LETTERS[1:10], each = 12)),
Count = sample(c(1:10),120, replace = T), stringsAsFactors = F)
df2 <- data.frame(df1[rep(seq_len(dim(df1)[1]), df1$Count), , drop = FALSE], row.names=NULL)
head(df2)
Month Product Count
1 Jan Product A 1
2 Feb Product A 4
3 Feb Product A 4
4 Feb Product A 4
5 Feb Product A 4
6 Mar Product A 10
I have data composed of 45000 rows and 5 columns, including 4 being characters and 1 numeric. And for this data I get above error.
Upvotes: 3
Views: 957
Reputation: 3791
You can do it this way. This handles negative and NA
values.
df2 <- data.frame(df1[rep(seq_len(dim(df1)[1]), with(df1, ifelse(Count > 0 & !is.na(Count), Count, 1))
), , drop = FALSE], row.names=NULL)
Rows where Count
is negative or NA
will be kept as they are (meaning they will be copied to df2
once without a repeat).
Upvotes: 4