Mario GS
Mario GS

Reputation: 879

Issues understanding gather function in tidyr

I'm having some issues understanding the gather function of tidyr. I have the following dataframe:

tidyv1 <- data.frame(name=c("Jake","Alice","Tim","Denise"),
                age=c(34,55,76,19),
                brown=c(0,0,1,0),
                blue=c(0,1,0,0),
                other=c(1,0,0,1),
                height=c(6.1,5.9,5.7,5.1))

I would like to take columns brown:other and make them one variable. Here is my code:

tidyc1 <- gather(tidyv1, key=eye_color, value=val, brown:other, factor_key=TRUE)

The outcome is this:

     name age height eye_color val
1    Jake  34    6.1     brown   0
2   Alice  55    5.9     brown   0
3     Tim  76    5.7     brown   1
4  Denise  19    5.1     brown   0
5    Jake  34    6.1      blue   0
6   Alice  55    5.9      blue   1
7     Tim  76    5.7      blue   0
8  Denise  19    5.1      blue   0
9    Jake  34    6.1     other   1
10  Alice  55    5.9     other   0
11    Tim  76    5.7     other   0
12 Denise  19    5.1     other   1

The outcome that I'm expecting is this:

    name age eye_color height
1   Jake  34     other    6.1
2  Alice  55      blue    5.9
3    Tim  76     brown    5.7
4 Denise  19     other    5.1

I'm aware that can be easily fix with extra code, but I want to understand if there is a direct way. For instance:

tidyc1[which(tidyc1[,5]==1),1:4]

Upvotes: 1

Views: 603

Answers (1)

alistaire
alistaire

Reputation: 43334

gather rearranges data by melting column names into one row and values into another, but doesn't drop data. In tidyv1, you have data that tells that people don't have certain eye colors, as well as that they do, all of which is kept by gather. If you have NAs instead, you can use na.rm = TRUE, but you'll still end up with an extra val column.

Thus, gather itself doesn't directly do what you want. You can clean up after the fact with

tidyc1[tidyc1$val == 1, -5]

...or inline with dplyr:

library(dplyr)
tidyv1 %>% gather(key=eye_color, value=val, brown:other, factor_key=TRUE) %>% 
    filter(val == 1) %>% select(-val)

...or just do the whole operation with dplyr:

tidyv1 %>% rowwise() %>% 
    mutate(eye_color = c('brown', 'blue', 'other')[which(c(brown, blue, other) == 1)]) %>% 
    select(-brown:-other)

...or with base:

tidyv1$eye_color <- apply(tidyv1[,c('brown', 'blue', 'other')], 1, 
                          function(x){c('brown', 'blue', 'other')[x == 1]})
tidyv1 <- tidyv1[,-3:-5]

You end up with the same thing regardless of which you use, so pick your favorite.

Upvotes: 3

Related Questions