User LG
User LG

Reputation: 307

Long to wide format with several duplicates. Circumvent with unique combo of columns

I have a dataset similar to this (real one is way bigger). It is in long format and I need to change it to wide format with one row per id. My problem is that there are a lot of different combinations of time, drug, unit and admin. Only a combination of time, drug, unit and admin will be unique and should only occur once pr id. I could not find a solution to this. I would like R to create unique combinations of columns so the data can be transformed to wide format. I have tried

melt.data.table(df, id.vars=c(id,time,drug,unit,admin), measure.vars = c(dose), na.rm=F)

and also a combination with

%>% expand(nesting(time, drug, unit, admin, dose), id)

but it doesn't work. Here is mock data:

id<-c(1492,1492,1492,1492,1493,1493)
time<-c("Pre-bypass","Post-bypass","Total","Post-bypass","Pre-OP","Pre-OP")
drug<-c("ACE","LEVO","LEVO","MIL","BB","BC")
unit<-c(NA,"ml/hr","ml","mg",NA,NA)
admin<-c(NA, "IV","IV","Inhale",NA,NA)
dose<-c(NA,50,40,5,NA,NA)
df<-rbind(id,time,drug,unit,admin,dose)
df<-t(df)
df<-as.data.table(df)

I would like my output to be something like this (the reason for the TRUE in Pre.bypass.Ace.unitNA.adminNA and Pre.OP columns is that dose and unit is missing here but because it is listed it is given in standard dose and unit:

id.new<-c(1492,1493)
Pre.OP.BB.unitNA.adminNA<-c(NA,TRUE)
Pre.OP.BC.unitNA.adminNA<-c(NA,TRUE)
Total.LEVO.ml.h.IV<-c(40,NA)
Pre.bypass.Ace.unitNA.adminNA<-c(TRUE,NA)
Post.bypass.LEVO.ml.h.IV<-c(50,NA)
Post.bypass.MIL.ml.h.IV<-c(5,NA)
df.new<-rbind(id.new,Post.bypass.MIL.ml.h.IV,Pre.OP.BB.unitNA.adminNA,Pre.OP.BC.unitNA.adminNA,Total.LEVO.ml.h.IV,Pre.bypass.Ace.unitNA.adminNA,Post.bypass.LEVO.ml.h.IV)
df.new<-t(df.new)

Upvotes: 1

Views: 1053

Answers (2)

chasemc
chasemc

Reputation: 869

library(data.table)
id <- c(1492, 1492, 1492, 1492, 1493, 1493)
time <- c("Pre-bypass", "Post-bypass", "Total", "Post-bypass", "Pre-OP", "Pre-OP")
drug <- c("ACE", "LEVO", "LEVO", "MIL", "BB", "BC")
unit <- c(NA, "ml/hr", "ml", "mg", NA, NA)
admin <- c(NA, "IV", "IV", "Inhale", NA, NA)
dose <- c(NA, 50, 40, 5, NA, NA)
df <- rbind(id, time, drug, unit, admin, dose)
df <- t(df)
df <- as.data.table(df)
df
#>      id        time drug  unit  admin dose
#> 1: 1492  Pre-bypass  ACE    NA     NA   NA
#> 2: 1492 Post-bypass LEVO ml/hr     IV   50
#> 3: 1492       Total LEVO    ml     IV   40
#> 4: 1492 Post-bypass  MIL    mg Inhale    5
#> 5: 1493      Pre-OP   BB    NA     NA   NA
#> 6: 1493      Pre-OP   BC    NA     NA   NA

Convert to wide with data.table package function dcast

data.table::dcast(df, id ~ ..., value.var = "dose")
#>      id Post-bypass_LEVO_ml/hr_IV Post-bypass_MIL_mg_Inhale
#> 1: 1492                        50                         5
#> 2: 1493                        NA                        NA
#>    Pre-OP_BB_NA_NA Pre-OP_BC_NA_NA Pre-bypass_ACE_NA_NA Total_LEVO_ml_IV
#> 1:              NA              NA                   NA               40
#> 2:              NA              NA                   NA               NA

Upvotes: 1

GordonShumway
GordonShumway

Reputation: 2056

I agree with the comments that long format is usually the better way to go. If you have to use wide format the using the tidyr package you can do the following:

library(tidyr)
df %>% 
  unite(combination, time, drug, unit, admin) %>% 
  spread(key = combination, value  = dose)

Upvotes: 1

Related Questions