Reputation: 133
I have survey data I'm working on in R that includes questions where respondents were given a list of options and asked to rank their top five. The data looks like this:
head(data)
responseid ChoiceA ChoiceB ChoiceC ChoiceD ChoiceE ChoiceF
1 001 5 2 1 NA 4 3
2 002 NA 4 3 5 2 1
3 003 3 1 NA 2 4 5
4 004 NA 5 2 1 3 4
I'd like to reshape it so that it looks like this:
head(data_new)
responseid first second third fourth fifth
1 001 C B F E A
2 002 F E C B D
3 003 B D A E F
4 004 D C E F B
What would be an efficient way to do this, preferably in tidyverse? I feel like the answer is going to involve changing the values in a new column based on the names of other columns, which I can't figure out how to do.
I'm also pulling this data out of SurveyGizmo, so if anyone has suggestions for a way to set it up so that the data exports to a csv the way I want it in the first place, I would appreciate that as well. Thanks!
Upvotes: 1
Views: 278
Reputation: 173793
For interest, a one-liner in base R that produces the correct ordering:
gsub("Choice", "", t(apply(data_new, 1, function(x) names(data_new)[-1][order(x[-1])]))[,1:5])
#> [,1] [,2] [,3] [,4] [,5]
#> 1 "C" "B" "F" "E" "A"
#> 2 "F" "E" "C" "B" "D"
#> 3 "B" "D" "A" "E" "F"
#> 4 "D" "C" "E" "F" "B"
Upvotes: 1
Reputation: 887048
We can reshape into 'long' format and then pivot it to 'wide' format
library(dplyr)
library(tidyr)
library(english)
df1 %>%
pivot_longer(cols = -responseid, values_drop_na = TRUE) %>%
separate(name, into = c("name1", "name2"), "(?<=[a-z])(?=[A-Z])") %>%
mutate(value = as.character(ordinal(value))) %>%
pivot_wider(names_from = value, values_from = name2)
Upvotes: 5
Reputation: 15072
The gist is that you want to pivot longer, clean it up a bit and then pivot wider using the ranks as the column names instead of the choices. This is a similar method to akrun, though I think given the limit of five choices hardcoding the ordinal column names is less complex.
library(tidyverse)
tbl <- read_table2(
"responseid ChoiceA ChoiceB ChoiceC ChoiceD ChoiceE ChoiceF
001 5 2 1 NA 4 3
002 NA 4 3 5 2 1
003 3 1 NA 2 4 5
004 NA 5 2 1 3 4"
)
tbl %>%
pivot_longer(
cols = -responseid,
names_to = "choice",
values_to = "rank",
values_drop_na = TRUE
) %>%
mutate(
choice = str_remove(choice, "Choice"),
rank = c("first", "second", "third", "fourth", "fifth")[as.integer(rank)]
) %>%
pivot_wider(names_from = rank, values_from = choice) %>%
select(responseid, first, second, third, fourth, fifth)
#> # A tibble: 4 x 6
#> responseid first second third fourth fifth
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 001 C B F E A
#> 2 002 F E C B D
#> 3 003 B D A E F
#> 4 004 D C E F B
Created on 2020-03-04 by the reprex package (v0.3.0)
Upvotes: 6