Sylvia Rodriguez
Sylvia Rodriguez

Reputation: 1353

How to 'reverse melt' a data.frame?

I have data.frame df1 (see code below). I would like to convert it to what df2 looks like (see code below).

Maybe this can be done with reshape cast or reverse melt? But I do not understand these functions. Can anyone help, please?

 df1 <- data.frame(
   stringsAsFactors = FALSE,
                    sample = c("a","a","a",
                               "a","b","c","c","c","c","c","c","c","c",
                               "d","d","e","e","e","g","g"),
                    LETTER = c("P","R","V",
                               "Y","Q","Q","R","S","T","U","W","X","Z",
                               "Q","X","Q","V","X","Q","T")
        )

 df2 <- data.frame(
   stringsAsFactors = FALSE,
             sample = c("a", "b", "c", "d", "e", "f", "g"),
                  P = c(1L, 0L, 0L, 0L, 0L, 0L, 0L),
                  Q = c(0L, 1L, 1L, 1L, 1L, 0L, 1L),
                  R = c(1L, 0L, 1L, 0L, 0L, 0L, 0L),
                  S = c(0L, 0L, 1L, 0L, 0L, 0L, 0L),
                  T = c(0L, 0L, 1L, 0L, 0L, 0L, 1L),
                  U = c(0L, 0L, 1L, 0L, 0L, 0L, 0L),
                  V = c(1L, 0L, 0L, 0L, 1L, 0L, 0L),
                  W = c(0L, 0L, 1L, 0L, 0L, 0L, 0L),
                  X = c(0L, 0L, 1L, 1L, 1L, 0L, 0L),
                  Y = c(1L, 0L, 0L, 0L, 0L, 0L, 0L),
                  Z = c(0L, 0L, 1L, 0L, 0L, 0L, 0L)
        )

Edit

It was suggested that I look at this post: How to reshape data from long to wide format. Unfortunately, that does not answer my question. The equivalent code would be as follows and throws the following error.

 df2 <- reshape(df, idvar = "sample", timevar = "LETTER", direction = "wide")
 Error in data[, timevar] : object of type 'closure' is not subsettable

First adding a third variable using df1$value <- 1L also does not solve it.

Please note that in my data, there is no exact match between length and width of the data, unlike in said post. Any help is still appreciated, please.

Upvotes: 4

Views: 1233

Answers (2)

AndreasM
AndreasM

Reputation: 942

You can create a frequency table with table() and transform the result into a data.frame.

x <- table(df1$sample, df1$LETTER)
df2 <- cbind(data.frame(sample = rownames(x)), as.data.frame.matrix(x))

sample P Q R S T U V W X Y Z
a      a 1 0 1 0 0 0 1 0 0 1 0
b      b 0 1 0 0 0 0 0 0 0 0 0
c      c 0 1 1 1 1 1 0 1 1 0 1
d      d 0 1 0 0 0 0 0 0 1 0 0
e      e 0 1 0 0 0 0 1 0 1 0 0
g      g 0 1 0 0 1 0 0 0 0 0 0

If you want to include sample = f (not present in df1) in the output you could add the missing value as a factor level to df$sample before calling table():

df1$sample <- factor(df1$sample, levels = letters[1:7])
x <- table(df1$sample2, df1$LETTER)
cbind(data.frame(sample = rownames(x)), as.data.frame.matrix(x))

  sample P Q R S T U V W X Y Z
a      a 1 0 1 0 0 0 1 0 0 1 0
b      b 0 1 0 0 0 0 0 0 0 0 0
c      c 0 1 1 1 1 1 0 1 1 0 1
d      d 0 1 0 0 0 0 0 0 1 0 0
e      e 0 1 0 0 0 0 1 0 1 0 0
f      f 0 0 0 0 0 0 0 0 0 0 0
g      g 0 1 0 0 1 0 0 0 0 0 0

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 388817

You can create a dummy column and get data in wide format :

library(dplyr)

df1 %>%
  mutate(n = 1) %>%
  tidyr::pivot_wider(names_from = LETTER, values_from = n, values_fill = 0)

#  sample     P     R     V     Y     Q     S     T     U     W     X     Z
#  <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 a          1     1     1     1     0     0     0     0     0     0     0
#2 b          0     0     0     0     1     0     0     0     0     0     0
#3 c          0     1     0     0     1     1     1     1     1     1     1
#4 d          0     0     0     0     1     0     0     0     0     1     0
#5 e          0     0     1     0     1     0     0     0     0     1     0
#6 g          0     0     0     0     1     0     1     0     0     0     0

Or in data.table :

library(data.table)
setDT(df1)[, n := 1]
dcast(df1, sample~LETTER, value.var = 'n', fill = 0)

Upvotes: 1

Related Questions