Reputation: 905
I'm trying to tidy a dataframe with gather
, but am losing info in the process. Take the following dataframe:
df <- data.frame(user = letters[1:10], score_1 = sample(1:20, 10), score_2 = sample(1:20, 10))
> head(df)
user score_1 score_2
1 a 5 7
2 b 15 2
3 c 18 15
4 d 1 20
5 e 17 12
6 f 8 19
From here, I need to keep the identity of the user (first column), turn column names score_1
and score_2
into keys in a new "key" column, and convert all values of columns 2 and 3 into the values of my new "value" column. This is what I've tried:
gather(df, key = "user", value = "score", c(2,3))
user score
1 score_1 5
2 score_1 15
3 score_1 18
4 score_1 1
5 score_1 17
6 score_1 8
7 score_1 2
8 score_1 20
9 score_1 9
10 score_1 3
11 score_2 7
12 score_2 2
13 score_2 15
14 score_2 20
15 score_2 12
16 score_2 19
17 score_2 8
18 score_2 13
19 score_2 4
20 score_2 18
This output is not satisfactory, because it loses the user
column in the original dataframe. What am I doing wrong?
Upvotes: 1
Views: 235
Reputation: 1304
Your issue seems to be using a variable name as the key. Try
gather(df, key = "usr", value = "score", score_1, score_2)
Upvotes: 1
Reputation: 1618
Note from manual ?gather
Development on
gather()
is complete, and for new code we recommend switching topivot_longer()
, which is easier to use, more featureful, and still under active development
Here is the solution
library(tidyr)
df %>%
pivot_longer(starts_with("score"), names_to = "score")
# # A tibble: 20 x 3
# user score value
# <fct> <chr> <int>
# 1 a score_1 10
# 2 a score_2 19
# 3 b score_1 13
# 4 b score_2 18
# 5 c score_1 11
# 6 c score_2 8
# 7 d score_1 15
# 8 d score_2 13
# 9 e score_1 14
# 10 e score_2 3
# 11 f score_1 8
# 12 f score_2 5
# 13 g score_1 2
# 14 g score_2 12
# 15 h score_1 17
# 16 h score_2 10
# 17 i score_1 3
# 18 i score_2 1
# 19 j score_1 1
# 20 j score_2 16
Upvotes: 2
Reputation: 887851
Reason is that the 'key' was specified as 'user'. Here, the key
is basically naming the column names created column in the 'long' format. If we specify it as 'user' (which is already a column in the dataset), it would update by removing the existing column
library(tidyr)
gather(df, key = "user1", value = "score", -user)
Upvotes: 2