Reputation: 897
I am thoruoghly researching this question on SO from the very morning. Original dataset has more than 1000 rows. My global goal is to extract particular columns to run an OLS-regression.
I selected the columns I need and transformed it to a wide format using pivot_wider
. In the transformed table I have 5 columns which represent indicators name. The rows are respondets' ids, the values are answers.
The problem is that after the transformation the values mutated into the nested objects. I tried to resolve this issue on a sample dataset using unnest(cols = everything())
. And it works fine:
examp_df <- tibble(
seance = rep(1:5, each = 5),
ind = rep(inds, 5),
ind_name = rep(inds_name, 5),
answer = list(rep(rnorm(5, 0.7, 1), 5))
)
examp_df_wide <- examp_df %>%
pivot_wider(id_cols = seance,
names_from = ind_name,
values_from = answer)
exmap_df_wide <- examp_df_wide %>%
unnest(cols = everything())
But when I try it on my original dataset, I receive an error about incompatiability of length. And then I do not understand how unnest
works.
Here's the dataset which I have problems with. How can I unnest the data?
Original data is here.
The code fot the original data is the following:
data_all <- data_all %>%
pivot_wider(id_cols = seance_id,
names_from = ind_name,
values_from = criteria_answ)
> data_all <- data_all %>%
+ unnest(cols = everything())
Error: Incompatible lengths: 4, 5.
Run `rlang::last_error()` to see where the error occurred.
Upvotes: 1
Views: 1174
Reputation: 389115
If you want an unnested dataframe you can do :
library(tidyr)
pivot_wider(data_all, names_from = ind_name, values_from = criteria_answ)
# A tibble: 3,930 x 7
# seance_id criteria name2 name1 name3 name5 name4
# <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 3133688 ind3_2 7 NA NA NA NA
# 2 3133688 ind4_2 6 NA NA NA NA
# 3 3133688 ind3_3 NA 7 NA NA NA
# 4 3133688 ind3_4 NA NA 7 NA NA
# 5 3133688 ind4_3 NA 6 NA NA NA
# 6 3133688 ind4_4 NA NA 6 NA NA
# 7 3133688 nps NA NA NA 5 NA
# 8 3145092 ind1_1 NA NA NA NA 5
# 9 3145092 ind1_2 4 NA NA NA NA
#10 3145092 ind1_3 NA 5 NA NA NA
# … with 3,920 more rows
If you want the output where every seance_id
is in 1 row you need to think how will you show those values that have more than 1 value in a column for a seance_id
? For example, if you look at the above output seance_id = 3133688
has two values in name2
column. To collapse 3133688
into one row how will you combine these values? Do you want to take their sum
, mean
or combine them as one comma separated value. You can use values_fn
argument in pivot_wider
and pass a function to apply. For example, with toString
:
pivot_wider(data_all, id_cols = seance_id, names_from = ind_name,
values_from = criteria_answ, values_fn = toString)
# A tibble: 422 x 6
# seance_id name2 name1 name3 name5 name4
# <chr> <chr> <chr> <chr> <chr> <chr>
# 1 3133688 7, 6 7, 6 7, 6 5 NA
# 2 3145092 4, 5, 5, 8 5, 5, 5, 9 5, 5, 6, 7 3 5, 6, 5, 8
# 3 3143656 10 10 10 10 10
# 4 3145088 9, 9, 9 10, 8, 8 9, 10, 7 8 9, 10, 10
# 5 3145117 6, 4, 7 7, 6, 9 7, 6, 9 6 8, 8, 9
# 6 3148589 10, 10, 7 10, 10, 5 8, 9, 5 9 10, 10, 7
# 7 3135731 10, 9, 7 9, 8, 6 8, 8, 8 8 10, 9, 7
# 8 3145111 7, 7, 7, 8, 9 10, 10, 9, 8, 9 9, 7, 7, 8, 9 4 9, 7, 9, 8, 9
# 9 3149981 8, 8, 8, 8 8, 8, 8, 9 8, 8, 8, 8 9 9, 8, 8, 9
#10 3150048 9 10 10 10 9
# … with 412 more rows
Upvotes: 1