Sparklyr: Reading multiple Parquet files from S3 runs indefinitely

Question

I'm working with Sparklyr to read Parquet files from an S3 bucket, and I'm facing an issue when trying to read multiple files. Reading a specific file works fine, but when attempting to read all files in a directory, the operation runs indefinitely. Here's a simplified version of the code I'm using:

library(sparklyr)

config$sparklyr.connect.enablehivesupport <- FALSE

sc <- spark_connect(master = "local", config = config)

sparklyr::spark_read_parquet( 
   sc,
   name = 'test',
   #path = 's3a://.../../data_01_04.parquet', #works fine
   #path = 's3a://.../../' #does not work
   #path = 's3a://.../../*.parquet' #does not work
 )

Am I missing something in the way I'm specifying the path for reading multiple files? Any insights or suggestions would be greatly appreciated.

Sparklyr: Reading multiple Parquet files from S3 runs indefinitely

Answers (1)

Related Questions