Reputation: 698
Running tidyr::unnest_wider
as part of a pipe, e.g.
df <- df %>%
unnest_wider(col, names_sep = "_", names_repair = "universal")
R crashes with the following error reported,
[91205:91206:20220410,071753.955164:ERROR file_io_posix.cc:148] open /home/matt/.r/crashpad_database/pending/dc2183c4-0851-4c62-908e-7d4e41a2702e.lock: File exists (17)
[91205:91205:20220410,071753.957703:ERROR process_memory_range.cc:86] read out of range
[91205:91205:20220410,071753.957712:ERROR elf_image_reader.cc:558] missing nul-terminator
[91205:91205:20220410,071753.957794:ERROR elf_dynamic_array_reader.h:61] tag not found
[91205:91205:20220410,071753.960132:ERROR elf_dynamic_array_reader.h:61] tag not found
[91205:91205:20220410,071753.960189:ERROR elf_dynamic_array_reader.h:61] tag not found
[91205:91205:20220410,071753.960236:ERROR elf_dynamic_array_reader.h:61] tag not found
[91205:91205:20220410,071753.960281:ERROR elf_dynamic_array_reader.h:61] tag not found
The data frame I'm working with is quite complex with many columns, several of which are nested. Here's the structure of the column I'm trying to unnest (showing only the first 3 elements as the other 1453 are similar):
> str(spatial_firing$shuffle_results_b_o)
List of 1456
$ : tibble [1,000 × 4] (S3: tbl_df/tbl/data.frame)
..$ neuron : int [1:1000] 0 1 2 3 4 5 6 7 8 9 ...
..$ r.squared: num [1:1000] 0.004358 0.036137 0.015214 0.001598 0.000695 ...
..$ slope : num [1:1000] 0.01191 0.02476 0.02087 -0.00629 -0.00461 ...
..$ p.value : num [1:1000] 0.616 0.146 0.348 0.762 0.842 ...
$ : tibble [1,000 × 4] (S3: tbl_df/tbl/data.frame)
..$ neuron : int [1:1000] 0 1 2 3 4 5 6 7 8 9 ...
..$ r.squared: num [1:1000] 0.00722 0.01216 0.00534 0.01284 0.04958 ...
..$ slope : num [1:1000] 0.0203 -0.0223 -0.0157 0.0209 -0.0448 ...
..$ p.value : num [1:1000] 0.5186 0.4015 0.5791 0.3886 0.0873 ...
$ : tibble [1,000 × 4] (S3: tbl_df/tbl/data.frame)
..$ neuron : int [1:1000] 0 1 2 3 4 5 6 7 8 9 ...
..$ r.squared: num [1:1000] 0.000795 0.001298 0.00132 0.000165 0.033603 ...
..$ slope : num [1:1000] -0.00453 0.00685 0.00645 -0.0024 0.04619 ...
..$ p.value : num [1:1000] 0.831 0.785 0.783 0.922 0.161 ...
Here is a reproducible example:
library(tidyverse)
f <- function(n) {
df <- tibble(neuron=0:(n-1), r.squared = rnorm(n),
slope = rnorm(n), p.value = rnorm(n))
df$p.value[2] <- NA
df
}
df <- replicate(1000, f(1000), simplify = FALSE)
dff <- tibble(x=df)
for (i in 1:100) {
cat(i, "\n")
unnest_wider(dff, x)
}
On my machine introducing the NAs causes this to crash typically at step 3 or 4. In the answer below the code will crash without the NAs but this doesn't happen reliably on my machine.
Things I've tried include:
I'm running R version 4.1.3 on Ubuntu 20.04.4.
I'd welcome suggestions for solutions or additional trouble shooting tests.
Upvotes: 1
Views: 326
Reputation: 698
The issue is fixed by updating r-lib/vctrs with:
devtools::install_github("r-lib/vctrs#1553")
I've confirmed that this works with the reproducible example and with the original issue.
Update: The fixed r-lib/vctrs is now available from CRAN.
Upvotes: 0
Reputation: 226532
This is not an answer, but a reproducible example. Seems like a bug in tidyr
or somewhere in the underlying tidyverse machinery (a segmentation fault is by definition a bug - nothing an R end-user does that doesn't mess around with compiled (C++/Fortran/etc.) code should ever be able to crash the R session, except possibly due to memory exhaustion).
I would post a tidyr issue about this if I were you ... (it also happens with the latest development version of tidyr
).
library(tidyverse)
f <- function(n) tibble(neuron=0:(n-1), r.squared = rnorm(n),
slope = rnorm(n), p.value = rnorm(n))
df <- replicate(1000, f(1000), simplify = FALSE)
Results look like yours:
str(df[1:3])
List of 3
$ : tibble [1,000 × 4] (S3: tbl_df/tbl/data.frame)
..$ neuron : int [1:1000] 0 1 2 3 4 5 6 7 8 9 ...
..$ r.squared: num [1:1000] 0.421 -0.445 0.816 0.752 0.635 ...
..$ slope : num [1:1000] 0.059 -1.4899 -0.0384 0.2601 -0.6293 ...
..$ p.value : num [1:1000] -0.754 1.023 0.123 0.817 0.382 ...
$ : tibble [1,000 × 4] (S3: tbl_df/tbl/data.frame)
..$ neuron : int [1:1000] 0 1 2 3 4 5 6 7 8 9 ...
..$ r.squared: num [1:1000] 0.504 0.153 -1.397 0.938 0.948 ...
..$ slope : num [1:1000] 0.9693 -1.2223 -0.4863 1.0936 -0.0792 ...
..$ p.value : num [1:1000] -1.018 -2.313 -1.593 -0.528 0.783 ...
$ : tibble [1,000 × 4] (S3: tbl_df/tbl/data.frame)
..$ neuron : int [1:1000] 0 1 2 3 4 5 6 7 8 9 ...
..$ r.squared: num [1:1000] 0.73 -0.926 2.144 0.795 1.002 ...
..$ slope : num [1:1000] -1.622 -0.664 1.286 0.419 1.285 ...
..$ p.value : num [1:1000] 1.408 -1.458 -1.096 0.339 -0.295 ...
unnest_wider
is supposed to be applied to a list-column and you're showing us a list, so I'll make a list-column out of it.
This crashes on iteration 3 on my machine (segmentation fault with "memory not mapped").
dff <- tibble(x=df)
for (i in 1:100) {
cat(i, "\n")
unnest_wider(dff, x)
}
R unstable, PopOS! 21.04, tidyr version 1.2.0
Upvotes: 1