Zizou
Zizou

Reputation: 503

Duplicate rows based on other column values in R

I would like to duplicate rows by column Count. For sample data, my code works fine, but when I try to use a large data set, I get the error:

Error in rep(seq_len(dim(df1)[1]), df1$Count) : invalid 'times' argument

My data & code:

df1 <- data.frame(Month = rep(month.abb[1:12],10,replace = TRUE), Product = paste0('Product ', rep(LETTERS[1:10], each = 12)),
                  Count = sample(c(1:10),120, replace = T),  stringsAsFactors = F)


df2 <- data.frame(df1[rep(seq_len(dim(df1)[1]), df1$Count), , drop = FALSE], row.names=NULL)

head(df2)
  Month   Product Count
1   Jan Product A     1
2   Feb Product A     4
3   Feb Product A     4
4   Feb Product A     4
5   Feb Product A     4
6   Mar Product A    10

I have data composed of 45000 rows and 5 columns, including 4 being characters and 1 numeric. And for this data I get above error.

Upvotes: 3

Views: 957

Answers (1)

deepseefan
deepseefan

Reputation: 3791

You can do it this way. This handles negative and NA values.

df2 <- data.frame(df1[rep(seq_len(dim(df1)[1]),  with(df1, ifelse(Count > 0 & !is.na(Count), Count, 1))
), , drop = FALSE], row.names=NULL)

Rows where Count is negative or NA will be kept as they are (meaning they will be copied to df2 once without a repeat).

Upvotes: 4

Related Questions