Why does `open_dataset` in R Arrow consume all RAM and crash my system, but selective queries work fine?

I am unable to run the following query:

df <- open_dataset("datos_2023.parquet")

When I attempt it, my system's RAM quickly gets consumed, causing severe slowdowns. I often need to force close R, and sometimes the system crashes with a Windows blue screen and restarts.

However, when I run this operation, it works without any issues and completes in just a few seconds:

open_dataset("datos_2023.parquet") %>% select(value) |> collect() |> sum()

Additionally, I have an older laptop where the same code runs without problems. What could be causing this issue?

Im using arrow 17.0.0.1

Upvotes: 0

Views: 80

Answers (0)

Related Questions