rg4s
rg4s

Reputation: 897

R: how does unnest function work and how to overcome incompatiable length error?

I am thoruoghly researching this question on SO from the very morning. Original dataset has more than 1000 rows. My global goal is to extract particular columns to run an OLS-regression.

I selected the columns I need and transformed it to a wide format using pivot_wider. In the transformed table I have 5 columns which represent indicators name. The rows are respondets' ids, the values are answers.

The problem is that after the transformation the values mutated into the nested objects. I tried to resolve this issue on a sample dataset using unnest(cols = everything()). And it works fine:

examp_df <- tibble(
  seance = rep(1:5, each = 5),
  ind = rep(inds, 5),
  ind_name = rep(inds_name, 5),
  answer = list(rep(rnorm(5, 0.7, 1), 5))
)

examp_df_wide <- examp_df %>%
  pivot_wider(id_cols = seance,
              names_from = ind_name,
              values_from = answer) 


exmap_df_wide <- examp_df_wide %>%
  unnest(cols = everything())

But when I try it on my original dataset, I receive an error about incompatiability of length. And then I do not understand how unnest works.

Here's the dataset which I have problems with. How can I unnest the data?

The list of sources that I have researched:

  1. Pivot wider produces nested object

  2. R: Error: Incompatible lengths when using unnest in dplyr

  3. Unnest or unchop dataframe containing lists of different lengths

  4. https://tidyr.tidyverse.org/articles/nest.html

Original data is here.

The code fot the original data is the following:

data_all <- data_all %>%
  pivot_wider(id_cols = seance_id,
              names_from = ind_name,
              values_from = criteria_answ)
> data_all <- data_all %>%
+   unnest(cols = everything())
Error: Incompatible lengths: 4, 5.
Run `rlang::last_error()` to see where the error occurred.

Upvotes: 1

Views: 1174

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389115

If you want an unnested dataframe you can do :

library(tidyr)

pivot_wider(data_all, names_from = ind_name, values_from = criteria_answ)

# A tibble: 3,930 x 7
#   seance_id criteria name2 name1 name3 name5 name4
#   <chr>     <chr>    <chr> <chr> <chr> <chr> <chr>
# 1 3133688   ind3_2   7     NA    NA    NA    NA   
# 2 3133688   ind4_2   6     NA    NA    NA    NA   
# 3 3133688   ind3_3   NA    7     NA    NA    NA   
# 4 3133688   ind3_4   NA    NA    7     NA    NA   
# 5 3133688   ind4_3   NA    6     NA    NA    NA   
# 6 3133688   ind4_4   NA    NA    6     NA    NA   
# 7 3133688   nps      NA    NA    NA    5     NA   
# 8 3145092   ind1_1   NA    NA    NA    NA    5    
# 9 3145092   ind1_2   4     NA    NA    NA    NA   
#10 3145092   ind1_3   NA    5     NA    NA    NA   
# … with 3,920 more rows

If you want the output where every seance_id is in 1 row you need to think how will you show those values that have more than 1 value in a column for a seance_id? For example, if you look at the above output seance_id = 3133688 has two values in name2 column. To collapse 3133688 into one row how will you combine these values? Do you want to take their sum, mean or combine them as one comma separated value. You can use values_fn argument in pivot_wider and pass a function to apply. For example, with toString :

pivot_wider(data_all, id_cols = seance_id, names_from = ind_name,
            values_from = criteria_answ, values_fn = toString)

# A tibble: 422 x 6
#   seance_id name2         name1           name3         name5 name4        
#   <chr>     <chr>         <chr>           <chr>         <chr> <chr>        
# 1 3133688   7, 6          7, 6            7, 6          5     NA           
# 2 3145092   4, 5, 5, 8    5, 5, 5, 9      5, 5, 6, 7    3     5, 6, 5, 8   
# 3 3143656   10            10              10            10    10           
# 4 3145088   9, 9, 9       10, 8, 8        9, 10, 7      8     9, 10, 10    
# 5 3145117   6, 4, 7       7, 6, 9         7, 6, 9       6     8, 8, 9      
# 6 3148589   10, 10, 7     10, 10, 5       8, 9, 5       9     10, 10, 7    
# 7 3135731   10, 9, 7      9, 8, 6         8, 8, 8       8     10, 9, 7     
# 8 3145111   7, 7, 7, 8, 9 10, 10, 9, 8, 9 9, 7, 7, 8, 9 4     9, 7, 9, 8, 9
# 9 3149981   8, 8, 8, 8    8, 8, 8, 9      8, 8, 8, 8    9     9, 8, 8, 9   
#10 3150048   9             10              10            10    9            
# … with 412 more rows

Upvotes: 1

Related Questions