S31
S31

Reputation: 934

Understanding tidyr::gather key/value arguments

Simple question,

I've provided two different data frames below with code/output, why does one work and the other doesn't? Having trouble understanding the Key/Value inputs (when they need to be explicitly defined/and what it means to just have them as strings in the input).

library(tidyverse)
dat <- data.frame(one = c("x", "x", "x"), two = c("x", "", "x"), 
                   three = c("", "", ""), type = c("chocolate", "vanilla", "strawberry")) 

dat %>%
  na_if("") %>%
  gather("Key", "Val", -type,na.rm=TRUE) %>%
  rowid_to_column  %>%
  spread(Key, Val,fill = "") %>%
  select(-1) # works well 

dat %>%
  na_if("") %>%
  gather("Key", "Val", -type,na.rm=TRUE) 
Error: Strings must match column names. Unknown columns: Val

Extra Credit: if someone could explain the effect of rowit_to_column & spread(), that'd be helpful.

Upvotes: 0

Views: 764

Answers (1)

Maurits Evers
Maurits Evers

Reputation: 50668

Perhaps I'm missing something, but I can't reproduce your error.

dat %>%
  na_if("") %>%                                  # Replace "" with NA
  gather("Key", "Val", -type, na.rm = TRUE) %>%  # wide -> long
  rowid_to_column()  %>%                         # Sequentially number rows
  spread(Key, Val, fill = "") %>%                # long -> wide
  select(-1) # works well                        # remove row number
  #        type one two
  #1  chocolate   x
  #2    vanilla   x
  #3 strawberry   x
  #4  chocolate       x
  #5 strawberry       x



dat %>%                                          
  na_if("") %>%                                  # Replace "" with NA
  gather("Key", "Val", -type, na.rm = TRUE)      # wide -> long
#        type Key Val
#1  chocolate one   x
#2    vanilla one   x
#3 strawberry one   x
#4  chocolate two   x
#6 strawberry two   x

Explanation:

  1. na_if("") replaces "" entries with NA.
  2. gather("Key", "Val", -type, na.rm = TRUE) turns a wide table into a long "key-value" table, by storing entries in all columns except type in two columns Key (i.e. the column name) and Val (i.e. the entry). na.rm = TRUE removes rows with NA values.
  3. rowid_to_column sequentially numbers the rows.
  4. spread(Key, Val, fill = "") turns a long "key-value" table into a wide table, with as many columns as there are unique keys in Key. Entries are taken from column Val, if an entry is missing it's filled with "".
  5. select(-1) removes the first column.

Upvotes: 1

Related Questions