alex
alex

Reputation: 77

Sparklyr: Reading multiple Parquet files from S3 runs indefinitely

I'm working with Sparklyr to read Parquet files from an S3 bucket, and I'm facing an issue when trying to read multiple files. Reading a specific file works fine, but when attempting to read all files in a directory, the operation runs indefinitely. Here's a simplified version of the code I'm using:

library(sparklyr)

config$sparklyr.connect.enablehivesupport <- FALSE

sc <- spark_connect(master = "local", config = config)

sparklyr::spark_read_parquet( 
   sc,
   name = 'test',
   #path = 's3a://.../../data_01_04.parquet', #works fine
   #path = 's3a://.../../' #does not work
   #path = 's3a://.../../*.parquet' #does not work
 )

Am I missing something in the way I'm specifying the path for reading multiple files? Any insights or suggestions would be greatly appreciated.

Upvotes: 0

Views: 59

Answers (1)

Gin
Gin

Reputation: 130

Have you tried enabling the recursive file lookup? And put the path ending with the folder name without /

Upvotes: 0

Related Questions