Reformat Dataframe With Unique Row Values as Columns

Question

I have a dataframe formatted as follows:

Order#     Product
1          Milk
1          Water
2          OJ
3          Soda
3          Lemonade

I want to reformat this so that there is 1 row per order, each product as a column, and a binary to indicate whether that order contained the product in the column:

Order#     Lemonade     Milk     OJ     Soda     Water
1          0            1        0      0        1
2          0            0        1      0        0
3          1            0        0      1        0

I believe reshape2 or tidyverse are packages I could use, but I'm not sure. Any help on how to do this would be appreciated.

akrun · Accepted Answer

We can use data.table as it would be more efficient and would work for bigger datasets

library(data.table)
dcast(setDT(df1), Order ~ Product, length)
#    Order Lemonade Milk OJ Soda Water
#1:     1        0    1  0    0     1
#2:     2        0    0  1    0     0
#3:     3        1    0  0    1     0

Also, in case there are duplicates, change the length with a logical expression

dcast(setDT(df1), Order ~ Product, function(x) as.integer(length(x) > 0))

For smaller datasets, the table from base R is also good

+(table(df1) > 0)

data

df1 <- structure(list(Order = c(1L, 1L, 2L, 3L, 3L), Product = c("Milk", 
 "Water", "OJ", "Soda", "Lemonade")), class = "data.frame",
  row.names = c(NA, -5L))

Reformat Dataframe With Unique Row Values as Columns

Answers (2)

data

Related Questions