NBK
NBK

Reputation: 905

gather function loses first column in original dataframe

I'm trying to tidy a dataframe with gather, but am losing info in the process. Take the following dataframe:

df <- data.frame(user = letters[1:10], score_1 = sample(1:20, 10), score_2 = sample(1:20, 10))

> head(df)
  user score_1 score_2
1    a       5       7
2    b      15       2
3    c      18      15
4    d       1      20
5    e      17      12
6    f       8      19

From here, I need to keep the identity of the user (first column), turn column names score_1 and score_2 into keys in a new "key" column, and convert all values of columns 2 and 3 into the values of my new "value" column. This is what I've tried:

gather(df, key = "user", value = "score", c(2,3))

      user score
1  score_1     5
2  score_1    15
3  score_1    18
4  score_1     1
5  score_1    17
6  score_1     8
7  score_1     2
8  score_1    20
9  score_1     9
10 score_1     3
11 score_2     7
12 score_2     2
13 score_2    15
14 score_2    20
15 score_2    12
16 score_2    19
17 score_2     8
18 score_2    13
19 score_2     4
20 score_2    18

This output is not satisfactory, because it loses the user column in the original dataframe. What am I doing wrong?

Upvotes: 1

Views: 235

Answers (3)

James Curran
James Curran

Reputation: 1304

Your issue seems to be using a variable name as the key. Try

gather(df, key = "usr", value = "score", score_1, score_2)

Upvotes: 1

nurandi
nurandi

Reputation: 1618

Note from manual ?gather

Development on gather() is complete, and for new code we recommend switching to pivot_longer(), which is easier to use, more featureful, and still under active development

Here is the solution

library(tidyr)
df %>% 
   pivot_longer(starts_with("score"), names_to = "score")

# # A tibble: 20 x 3
#    user  score   value
#    <fct> <chr>   <int>
#  1 a     score_1    10
#  2 a     score_2    19
#  3 b     score_1    13
#  4 b     score_2    18
#  5 c     score_1    11
#  6 c     score_2     8
#  7 d     score_1    15
#  8 d     score_2    13
#  9 e     score_1    14
# 10 e     score_2     3
# 11 f     score_1     8
# 12 f     score_2     5
# 13 g     score_1     2
# 14 g     score_2    12
# 15 h     score_1    17
# 16 h     score_2    10
# 17 i     score_1     3
# 18 i     score_2     1
# 19 j     score_1     1
# 20 j     score_2    16

Upvotes: 2

akrun
akrun

Reputation: 887851

Reason is that the 'key' was specified as 'user'. Here, the key is basically naming the column names created column in the 'long' format. If we specify it as 'user' (which is already a column in the dataset), it would update by removing the existing column

library(tidyr)
gather(df, key = "user1", value = "score", -user)

Upvotes: 2

Related Questions