Reputation: 119
I have a dataframe formatted as follows:
Order# Product
1 Milk
1 Water
2 OJ
3 Soda
3 Lemonade
I want to reformat this so that there is 1 row per order, each product as a column, and a binary to indicate whether that order contained the product in the column:
Order# Lemonade Milk OJ Soda Water
1 0 1 0 0 1
2 0 0 1 0 0
3 1 0 0 1 0
I believe reshape2 or tidyverse are packages I could use, but I'm not sure. Any help on how to do this would be appreciated.
Upvotes: 1
Views: 63
Reputation: 887158
We can use data.table
as it would be more efficient and would work for bigger datasets
library(data.table)
dcast(setDT(df1), Order ~ Product, length)
# Order Lemonade Milk OJ Soda Water
#1: 1 0 1 0 0 1
#2: 2 0 0 1 0 0
#3: 3 1 0 0 1 0
Also, in case there are duplicates, change the length
with a logical expression
dcast(setDT(df1), Order ~ Product, function(x) as.integer(length(x) > 0))
For smaller datasets, the table
from base R
is also good
+(table(df1) > 0)
df1 <- structure(list(Order = c(1L, 1L, 2L, 3L, 3L), Product = c("Milk",
"Water", "OJ", "Soda", "Lemonade")), class = "data.frame",
row.names = c(NA, -5L))
Upvotes: 2
Reputation: 7592
df$v<-1
tidyr::spread(df,Product,v,fill=0)
Order Lemonade Milk OJ Soda Water
1 1 0 1 0 0 1
2 2 0 0 1 0 0
3 3 1 0 0 1 0
Upvotes: 1