Jacob Lindberg
Jacob Lindberg

Reputation: 51

Tidy data Melt and Cast

In the Wickham's Tidy Data pdf he has an example to go from messy to tidy data.

I wonder where the code is?

For example, what code is used to go from

Table 1: Typical presentation dataset.

to

Table 3: The same data as in Table 1 but with variables in columns and observations in rows.

Per haps melt or cast. But from http://www.statmethods.net/management/reshape.html I cant see how.

(Note to self: Need it for GDPpercapita...)

Upvotes: 2

Views: 872

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

The answer sort of depends on what the structure of your data are. In the paper you linked to, Hadley was writing about the "reshape" and "reshape2" packages.

It's ambiguous what the data structure is in "Table 1". Judging by the description, it would sound like a matrix with named dimnames (like I show in mymat). In that case, a simple melt would work:

library(reshape2)
melt(mymat)
#           Var1       Var2 value
# 1   John Smith treatmenta     —
# 2     Jane Doe treatmenta    16
# 3 Mary Johnson treatmenta     3
# 4   John Smith treatmentb     2
# 5     Jane Doe treatmentb    11
# 6 Mary Johnson treatmentb     1

If it were not a matrix, but a data.frame with row.names, you can still use the matrix method by using something like melt(as.matrix(mymat)).

If, on the other hand, the "names" are a column in a data.frame (as they are in the "tidyr" vignette, you need to specify either the id.vars or the measure.vars so that melt knows how to treat the columns.

melt(mydf, id.vars = "name")
#           name   variable value
# 1   John Smith treatmenta     —
# 2     Jane Doe treatmenta    16
# 3 Mary Johnson treatmenta     3
# 4   John Smith treatmentb     2
# 5     Jane Doe treatmentb    11
# 6 Mary Johnson treatmentb     1

The new kid on the block is "tidyr". The "tidyr" package works with data.frames because it is often used in conjunction with dplyr. I won't reproduce the code for "tidyr" here, because that is sufficiently covered in the vignette.


Sample data:

mymat <- structure(c("—", "16", "3", " 2", "11", " 1"), .Dim = c(3L, 
    2L), .Dimnames = list(c("John Smith", "Jane Doe", "Mary Johnson"
    ), c("treatmenta", "treatmentb")))

mydf <- structure(list(name = structure(c(2L, 1L, 3L), .Label = c("Jane Doe", 
    "John Smith", "Mary Johnson"), class = "factor"), treatmenta = c("—", 
    "16", "3"), treatmentb = c(2L, 11L, 1L)), .Names = c("name", 
    "treatmenta", "treatmentb"), row.names = c(NA, 3L), class = "data.frame")

Upvotes: 2

Related Questions