Reshaping a data frame based on column names

Question

I have a data frame consisting of 1 observation and 136 variables. Each variable is a unique combination of different sets, and the observation is a convergence score between each of the two sets. A simplified version of the original df looks like this:

#Original df
mydf <- data.frame(setA_setB = c(11), setA_setC = c(21), setB_setC = c(31))
mydf

What I am trying to get is a data frame looking like this:

#Final df
final.mydf <- data.frame(set = c("setA", "setB", "setC"), setA = c(NA, 11, 21), setB = c(11, NA, 31), setC = c(21, 31, NA))
final.mydf

So, it is necessary to first create column and row names splitting the colnames of mydf at "_", and I have been doing this with the following code:

#List of set names:
setNames <- unique(unlist(strsplit(colnames(mydf), "_")))

Then, I don't know how to proceed in order to assign to each entry of the matrix the correct value based on the column name.

TooYoung · Accepted Answer

I am suggesting the cast function from reshape package. We first redefine your dataframe

redf <- data.frame(cbind(do.call(rbind,(strsplit(names(mydf),"_"))),t(mydf)),stringsAsFactors = F)
names(redf) <- c("set1","set2","value")
redf
#           set1 set2 value
# setA_setB setA setB    11
# setA_setC setA setC    21
# setB_setC setB setC    31

The first two columns are the two sets and the third column is the corresponding value. Since you want a matrix, which means “two-ways”. We switch the set1 and set2

invdf <- subset(redf,set1!=set2)
names(invdf) <- c("set2","set1","value")
invdf
#           set2 set1 value
# setA_setB setA setB    11
# setA_setC setA setC    21
# setB_setC setB setC    31

Finally combine the two dataframe and use cast

alldf <- rbind(redf,invdf)
alldf$value <- as.numeric(alldf$value)
alldf
library(reshape)
cast(alldf,set1~set2,sum)
#   set1 setA setB setC
# 1 setA    0   11   21
# 2 setB   11    0   31
# 3 setC   21   31    0

Reshaping a data frame based on column names

Answers (2)

Related Questions