Reputation: 897
I have a very specific dataframe. It is basically tha sample of answers on my survey. Variables v1
represent a multiple choice question; the respondent were to choose variants from 1 to 4, he/she could choose the only or several options. Every chosen variant readressed the respondent to the block: if 1
was chosen, the respondent was readressed to o_v1_1
, c_v1_1
, and f_v1_1
and so on.
I want to write a function that will change my data structure from wide to long. But I am struggling with pivot_longer
, because it does not produce a desirable output.
Here's the sample dataframe with some initial data processing:
structure(list(seance_id = c(1, 2, 3, 4), respondent = c("A",
"B", "C", "D"), v1...3 = c(1, 1, NA, 1), v1...4 = c(2, NA, 2,
NA), v1...5 = c(3, 4, 4, NA), v1...6 = c(4, NA, NA, NA), o_v1_1 = c(6,
1, NA, 4), c_v1_1 = c(7, 1, NA, 1), f_v1_1 = c(8, 1, NA, 1),
o_v1_2 = c(10, NA, 4, NA), c_v1_2 = c(8, NA, 1, NA), f_v1_2 = c(3,
NA, 3, NA), o_v1_3 = c(4, NA, NA, NA), c_v1_3 = c(1, NA,
NA, NA), f_v1_3 = c(2, NA, NA, NA), o_v1_4 = c(10, 5, 4,
NA), c_v1_4 = c(9, 6, 5, NA), f_v1_4 = c(9, 6, 6, NA)), row.names = c(NA,
-4L), class = c("tbl_df", "tbl", "data.frame"))
data <- data %>% mutate_if(is.numeric, as.character)
colnames(data) <- c("seance_id", "respondent", "v1", "v1", "v1", "v1", "o_v1_1",
"c_v1_1", "f_v1_1", "o_v1_2", "c_v1_2", "f_v1_2", "o_v1_3", "c_v1_3",
"f_v1_3", "o_v1_4", "c_v1_4", "f_v1_4")
And here's how I tried to make my table long:
long <- data %>%
pivot_longer(cols = -`seance_id`, names_to = "v1", values_to = "answer")
And this is what I want to get:
`séance_id` respondent direction answer_dir criteria criteria_answer
<dbl> <chr> <chr> <dbl> <chr> <dbl>
1 1 A v1 1 o_v1_1 6
2 1 A v1 1 c_v1_1 7
3 1 A v1 1 f_v1_1 8
4 1 A v1 2 o_v1_2 10
5 1 A v1 2 c_v1_2 8
6 1 A v1 2 f_v1_2 3
I have been researching SO for 2 days already and have not resolved my problem yet. How can I use pivot_longer
effectively to get the desirable output? And is there any way to automate the process of longing my dfs? I have more than 30 of dfs, nested in different lists within one Excel file.
Upvotes: 0
Views: 37
Reputation: 5766
You can drop you initial v1
columns. I instead propose the following:
data %>% select(-starts_with('v1')) %>%
pivot_longer(cols = contains('v1'), names_to = "v1", values_to = "criteria_answer") %>%
separate(v1, sep='_', into=c('w','direction','answer_dir'), remove=FALSE) %>%
rename(creteria=v1) %>% select(-w)
# A tibble: 48 x 6
seance_id respondent creteria direction answer_dir criteria_answer
<chr> <chr> <chr> <chr> <chr> <chr>
1 1 A o_v1_1 v1 1 6
2 1 A c_v1_1 v1 1 7
3 1 A f_v1_1 v1 1 8
4 1 A o_v1_2 v1 2 10
5 1 A c_v1_2 v1 2 8
6 1 A f_v1_2 v1 2 3
7 1 A o_v1_3 v1 3 4
8 1 A c_v1_3 v1 3 1
9 1 A f_v1_3 v1 3 2
10 1 A o_v1_4 v1 4 10
# ... with 38 more rows
The final select(-w)
is to remove the w
-column, an artefact from separate
splitting o_v1_1
,c_v1_1
,etc. into 3 columns. Here w
was for the first character.
Upvotes: 2