Reputation: 1353
I have data.frame
df1
(see code below). I would like to convert it to what df2
looks like (see code below).
Maybe this can be done with reshape
cast
or reverse melt
? But I do not understand these functions. Can anyone help, please?
df1 <- data.frame(
stringsAsFactors = FALSE,
sample = c("a","a","a",
"a","b","c","c","c","c","c","c","c","c",
"d","d","e","e","e","g","g"),
LETTER = c("P","R","V",
"Y","Q","Q","R","S","T","U","W","X","Z",
"Q","X","Q","V","X","Q","T")
)
df2 <- data.frame(
stringsAsFactors = FALSE,
sample = c("a", "b", "c", "d", "e", "f", "g"),
P = c(1L, 0L, 0L, 0L, 0L, 0L, 0L),
Q = c(0L, 1L, 1L, 1L, 1L, 0L, 1L),
R = c(1L, 0L, 1L, 0L, 0L, 0L, 0L),
S = c(0L, 0L, 1L, 0L, 0L, 0L, 0L),
T = c(0L, 0L, 1L, 0L, 0L, 0L, 1L),
U = c(0L, 0L, 1L, 0L, 0L, 0L, 0L),
V = c(1L, 0L, 0L, 0L, 1L, 0L, 0L),
W = c(0L, 0L, 1L, 0L, 0L, 0L, 0L),
X = c(0L, 0L, 1L, 1L, 1L, 0L, 0L),
Y = c(1L, 0L, 0L, 0L, 0L, 0L, 0L),
Z = c(0L, 0L, 1L, 0L, 0L, 0L, 0L)
)
Edit
It was suggested that I look at this post: How to reshape data from long to wide format. Unfortunately, that does not answer my question. The equivalent code would be as follows and throws the following error.
df2 <- reshape(df, idvar = "sample", timevar = "LETTER", direction = "wide")
Error in data[, timevar] : object of type 'closure' is not subsettable
First adding a third variable using df1$value <- 1L
also does not solve it.
Please note that in my data, there is no exact match between length and width of the data, unlike in said post. Any help is still appreciated, please.
Upvotes: 4
Views: 1233
Reputation: 942
You can create a frequency table with table()
and transform the result into a data.frame.
x <- table(df1$sample, df1$LETTER)
df2 <- cbind(data.frame(sample = rownames(x)), as.data.frame.matrix(x))
sample P Q R S T U V W X Y Z
a a 1 0 1 0 0 0 1 0 0 1 0
b b 0 1 0 0 0 0 0 0 0 0 0
c c 0 1 1 1 1 1 0 1 1 0 1
d d 0 1 0 0 0 0 0 0 1 0 0
e e 0 1 0 0 0 0 1 0 1 0 0
g g 0 1 0 0 1 0 0 0 0 0 0
If you want to include sample = f (not present in df1) in the output you could add the missing value as a factor level to df$sample
before calling table()
:
df1$sample <- factor(df1$sample, levels = letters[1:7])
x <- table(df1$sample2, df1$LETTER)
cbind(data.frame(sample = rownames(x)), as.data.frame.matrix(x))
sample P Q R S T U V W X Y Z
a a 1 0 1 0 0 0 1 0 0 1 0
b b 0 1 0 0 0 0 0 0 0 0 0
c c 0 1 1 1 1 1 0 1 1 0 1
d d 0 1 0 0 0 0 0 0 1 0 0
e e 0 1 0 0 0 0 1 0 1 0 0
f f 0 0 0 0 0 0 0 0 0 0 0
g g 0 1 0 0 1 0 0 0 0 0 0
Upvotes: 3
Reputation: 388817
You can create a dummy column and get data in wide format :
library(dplyr)
df1 %>%
mutate(n = 1) %>%
tidyr::pivot_wider(names_from = LETTER, values_from = n, values_fill = 0)
# sample P R V Y Q S T U W X Z
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 a 1 1 1 1 0 0 0 0 0 0 0
#2 b 0 0 0 0 1 0 0 0 0 0 0
#3 c 0 1 0 0 1 1 1 1 1 1 1
#4 d 0 0 0 0 1 0 0 0 0 1 0
#5 e 0 0 1 0 1 0 0 0 0 1 0
#6 g 0 0 0 0 1 0 1 0 0 0 0
Or in data.table
:
library(data.table)
setDT(df1)[, n := 1]
dcast(df1, sample~LETTER, value.var = 'n', fill = 0)
Upvotes: 1