number of spark partitions when reading from buckets - S3 - GCS

Question

S3 and GCS are not block storage as opposite as HDFS so the way how Spark creates partitions when reading from these sources is not that clear to me. I am now reading from GCS but I get 2 partitions for small files (10 bytes), and also for medium files 100 MBs.

Has anyone an explanation?

stevel · Accepted Answer

generally it's a configuration option, "how big to lie about partition size".

number of spark partitions when reading from buckets - S3 - GCS

Answers (1)

Related Questions