Reputation: 51
In the Wickham's Tidy Data pdf he has an example to go from messy to tidy data.
I wonder where the code is?
For example, what code is used to go from
Table 1: Typical presentation dataset.
to
Table 3: The same data as in Table 1 but with variables in columns and observations in rows.
Per haps melt or cast. But from http://www.statmethods.net/management/reshape.html I cant see how.
(Note to self: Need it for GDPpercapita...)
Upvotes: 2
Views: 872
Reputation: 193517
The answer sort of depends on what the structure of your data are. In the paper you linked to, Hadley was writing about the "reshape" and "reshape2" packages.
It's ambiguous what the data structure is in "Table 1". Judging by the description, it would sound like a matrix
with named dimnames (like I show in mymat
). In that case, a simple melt
would work:
library(reshape2)
melt(mymat)
# Var1 Var2 value
# 1 John Smith treatmenta —
# 2 Jane Doe treatmenta 16
# 3 Mary Johnson treatmenta 3
# 4 John Smith treatmentb 2
# 5 Jane Doe treatmentb 11
# 6 Mary Johnson treatmentb 1
If it were not a matrix, but a data.frame
with row.name
s, you can still use the matrix
method by using something like melt(as.matrix(mymat))
.
If, on the other hand, the "names" are a column in a data.frame
(as they are in the "tidyr" vignette, you need to specify either the id.vars
or the measure.vars
so that melt
knows how to treat the columns.
melt(mydf, id.vars = "name")
# name variable value
# 1 John Smith treatmenta —
# 2 Jane Doe treatmenta 16
# 3 Mary Johnson treatmenta 3
# 4 John Smith treatmentb 2
# 5 Jane Doe treatmentb 11
# 6 Mary Johnson treatmentb 1
The new kid on the block is "tidyr". The "tidyr" package works with data.frame
s because it is often used in conjunction with dplyr
. I won't reproduce the code for "tidyr" here, because that is sufficiently covered in the vignette.
Sample data:
mymat <- structure(c("—", "16", "3", " 2", "11", " 1"), .Dim = c(3L,
2L), .Dimnames = list(c("John Smith", "Jane Doe", "Mary Johnson"
), c("treatmenta", "treatmentb")))
mydf <- structure(list(name = structure(c(2L, 1L, 3L), .Label = c("Jane Doe",
"John Smith", "Mary Johnson"), class = "factor"), treatmenta = c("—",
"16", "3"), treatmentb = c(2L, 11L, 1L)), .Names = c("name",
"treatmenta", "treatmentb"), row.names = c(NA, 3L), class = "data.frame")
Upvotes: 2