Long to wide format with several duplicates. Circumvent with unique combo of columns

Question

I have a dataset similar to this (real one is way bigger). It is in long format and I need to change it to wide format with one row per id. My problem is that there are a lot of different combinations of time, drug, unit and admin. Only a combination of time, drug, unit and admin will be unique and should only occur once pr id. I could not find a solution to this. I would like R to create unique combinations of columns so the data can be transformed to wide format. I have tried

melt.data.table(df, id.vars=c(id,time,drug,unit,admin), measure.vars = c(dose), na.rm=F)

and also a combination with

%>% expand(nesting(time, drug, unit, admin, dose), id)

but it doesn't work. Here is mock data:

id<-c(1492,1492,1492,1492,1493,1493)
time<-c("Pre-bypass","Post-bypass","Total","Post-bypass","Pre-OP","Pre-OP")
drug<-c("ACE","LEVO","LEVO","MIL","BB","BC")
unit<-c(NA,"ml/hr","ml","mg",NA,NA)
admin<-c(NA, "IV","IV","Inhale",NA,NA)
dose<-c(NA,50,40,5,NA,NA)
df<-rbind(id,time,drug,unit,admin,dose)
df<-t(df)
df<-as.data.table(df)

I would like my output to be something like this (the reason for the TRUE in Pre.bypass.Ace.unitNA.adminNA and Pre.OP columns is that dose and unit is missing here but because it is listed it is given in standard dose and unit:

id.new<-c(1492,1493)
Pre.OP.BB.unitNA.adminNA<-c(NA,TRUE)
Pre.OP.BC.unitNA.adminNA<-c(NA,TRUE)
Total.LEVO.ml.h.IV<-c(40,NA)
Pre.bypass.Ace.unitNA.adminNA<-c(TRUE,NA)
Post.bypass.LEVO.ml.h.IV<-c(50,NA)
Post.bypass.MIL.ml.h.IV<-c(5,NA)
df.new<-rbind(id.new,Post.bypass.MIL.ml.h.IV,Pre.OP.BB.unitNA.adminNA,Pre.OP.BC.unitNA.adminNA,Total.LEVO.ml.h.IV,Pre.bypass.Ace.unitNA.adminNA,Post.bypass.LEVO.ml.h.IV)
df.new<-t(df.new)

GordonShumway · Accepted Answer

I agree with the comments that long format is usually the better way to go. If you have to use wide format the using the tidyr package you can do the following:

library(tidyr)
df %>% 
  unite(combination, time, drug, unit, admin) %>% 
  spread(key = combination, value  = dose)

Long to wide format with several duplicates. Circumvent with unique combo of columns

Answers (2)

Related Questions