Reputation: 77
I'm working with Sparklyr to read Parquet files from an S3 bucket, and I'm facing an issue when trying to read multiple files. Reading a specific file works fine, but when attempting to read all files in a directory, the operation runs indefinitely. Here's a simplified version of the code I'm using:
library(sparklyr)
config$sparklyr.connect.enablehivesupport <- FALSE
sc <- spark_connect(master = "local", config = config)
sparklyr::spark_read_parquet(
sc,
name = 'test',
#path = 's3a://.../../data_01_04.parquet', #works fine
#path = 's3a://.../../' #does not work
#path = 's3a://.../../*.parquet' #does not work
)
Am I missing something in the way I'm specifying the path for reading multiple files? Any insights or suggestions would be greatly appreciated.
Upvotes: 0
Views: 59
Reputation: 130
Have you tried enabling the recursive file lookup? And put the path ending with the folder name without /
Upvotes: 0