Reputation: 23
Basically for each id i have a set of product id, and i tried to spread them across a set of define columns. each id can only have 5 product_id. ex:
id product_id
1 305
1 402
2 200
1 305
3 402
3 402
so I spread as a binary outcome like:
id 305 402 200
1 2 0 0
2 0 0 1
3 0 2 0
but i would like:
id product1 product2 product3 product4... until 5
1 305 305 0
2 200 0 0
3 402 402 0
if someone have something clean ( I have around 10K rows) That would be awesome!! thanks!
#this gives me the binary outcome
for (i in names(test2[2:18])) {
test2$product1[test2[i] == 1 ] <- i
}
#this is a try to iterate through each row but it s pretty bad
for(i in 1:nrow(test2)){
if(test2[i,1]== 1){
test2$product1[i] <- colnames(test2[1])
} else if(test2[i,1]==2){
test2$product1[i] <- colnames(test2[1])
test2$product2[i] <- colnames(test2[1])
} else if(test2[i,1]==3){
test2$product1[i] <- colnames(test2[1])
test2$product2[i] <- colnames(test2[1])
test2$product3[i] <- colnames(test2[1])
} else if(test2[i,1]==4){
and so one...
expected:
id product1 product2 product3 product4... until 5
1 305 305 0
2 200 0 0
3 402 402 0
actual:
id 305 402 200
1 2 0 0
2 0 0 1
3 0 2 0
Upvotes: 2
Views: 41
Reputation: 887511
We could create a sequence column by 'id' and then spread
. Note that simply spread
ing will not have all the 'product' until 5 as these are missing in the data. Inorder to do that, create the sequence as a factor
with levels
specified from 'product1' to 'product5' and in the spread
, specify the drop = FALSE
for not dropping the unused levels
library(tidyverse)
df1 %>%
group_by(id) %>%
mutate(product = factor(paste0('product', row_number()),
levels = paste0('product', 1:5))) %>%
spread(product, product_id, drop = FALSE, fill = 0)
# A tibble: 3 x 6
# Groups: id [3]
# id product1 product2 product3 product4 product5
# <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 305 402 305 0 0
#2 2 200 0 0 0 0
#3 3 402 402 0 0 0
df1 <- structure(list(id = c(1L, 1L, 2L, 1L, 3L, 3L), product_id = c(305L,
402L, 200L, 305L, 402L, 402L)), class = "data.frame", row.names = c(NA,
-6L))
Upvotes: 1