Tyler Rinker
Tyler Rinker

Reputation: 109844

Comparing gather (tidyr) to melt (reshape2)

I love the reshape2 package because it made life so doggone easy. Typically Hadley has made improvements in his previous packages that enable streamlined, faster running code. I figured I'd give tidyr a whirl and from what I read I thought gather was very similar to melt from reshape2. But after reading the documentation I can't get gather to do the same task that melt does.

Data View

Here's a view of the data (actual data in dput form at end of post):

  teacher yr1.baseline     pd yr1.lesson1 yr1.lesson2 yr2.lesson1 yr2.lesson2 yr2.lesson3
1       3      1/13/09 2/5/09      3/6/09     4/27/09     10/7/09    11/18/09      3/4/10
2       7      1/15/09 2/5/09      3/3/09      5/5/09    10/16/09    11/18/09      3/4/10
3       8      1/27/09 2/5/09      3/3/09     4/27/09     10/7/09    11/18/09      3/5/10

Code

Here's the code in melt fashion, my attempt at gather. How can I make gather do the same thing as melt?

library(reshape2); library(dplyr); library(tidyr)

dat %>% 
   melt(id=c("teacher", "pd"), value.name="date") 

dat %>% 
   gather(key=c(teacher, pd), value=date, -c(teacher, pd)) 

Desired Output

   teacher     pd     variable     date
1        3 2/5/09 yr1.baseline  1/13/09
2        7 2/5/09 yr1.baseline  1/15/09
3        8 2/5/09 yr1.baseline  1/27/09
4        3 2/5/09  yr1.lesson1   3/6/09
5        7 2/5/09  yr1.lesson1   3/3/09
6        8 2/5/09  yr1.lesson1   3/3/09
7        3 2/5/09  yr1.lesson2  4/27/09
8        7 2/5/09  yr1.lesson2   5/5/09
9        8 2/5/09  yr1.lesson2  4/27/09
10       3 2/5/09  yr2.lesson1  10/7/09
11       7 2/5/09  yr2.lesson1 10/16/09
12       8 2/5/09  yr2.lesson1  10/7/09
13       3 2/5/09  yr2.lesson2 11/18/09
14       7 2/5/09  yr2.lesson2 11/18/09
15       8 2/5/09  yr2.lesson2 11/18/09
16       3 2/5/09  yr2.lesson3   3/4/10
17       7 2/5/09  yr2.lesson3   3/4/10
18       8 2/5/09  yr2.lesson3   3/5/10

Data

dat <- data.frame(
  teacher = factor(c("3", "7", "8")),
  yr1.baseline = factor(c("1/13/09", "1/15/09", "1/27/09")),
  pd = factor(c("2/5/09", "2/5/09", "2/5/09")),
  yr1.lesson1 = factor(c("3/6/09", "3/3/09", "3/3/09")),
  yr1.lesson2 = factor(c("4/27/09", "5/5/09", "4/27/09")),
  yr2.lesson1 = factor(c("10/7/09", "10/16/09", "10/7/09")),
  yr2.lesson2 = factor(c("11/18/09", "11/18/09", "11/18/09")),
  yr2.lesson3 = factor(c("3/4/10", "3/4/10", "3/5/10"))
)

Upvotes: 73

Views: 33400

Answers (3)

Sravan
Sravan

Reputation: 11

My solution

    dat%>%
    gather(!c(teacher,pd),key=variable,value=date)

Upvotes: 0

Joe
Joe

Reputation: 8601

In tidyr 1.0.0 this task is accomplished with the more flexible pivot_longer().

The equivalent syntax would be

library(tidyr)
dat %>% pivot_longer(cols = -c(teacher, pd), names_to = "variable", values_to = "date")

which says, correspondingly, "pivot everything longer except teacher and pd, calling the new variable column "variable" and the new value column "date".

Note that the long data comes back in order firstly of the columns of the previous data frame that were pivoted, unlike from gather, which came back in the order of the new variable column. To rearrange the resultant tibble, use dplyr::arrange().

Upvotes: 13

David Robinson
David Robinson

Reputation: 78590

Your gather line should look like:

dat %>% gather(variable, date, -teacher, -pd)

This says "Gather all variables except teacher and pd, calling the new key column 'variable' and the new value column 'date'."


As an explanation, note the following from the help(gather) page:

 ...: Specification of columns to gather. Use bare variable names.
      Select all variables between x and z with ‘x:z’, exclude y
      with ‘-y’. For more options, see the select documentation.

Since this is an ellipsis, the specification of columns to gather is given as separate (bare name) arguments. We wish to gather all columns except teacher and pd, so we use -.

Upvotes: 91

Related Questions