Edge7
Edge7

Reputation: 681

number of spark partitions when reading from buckets - S3 - GCS

S3 and GCS are not block storage as opposite as HDFS so the way how Spark creates partitions when reading from these sources is not that clear to me. I am now reading from GCS but I get 2 partitions for small files (10 bytes), and also for medium files 100 MBs.

Has anyone an explanation?

Upvotes: 1

Views: 538

Answers (1)

stevel
stevel

Reputation: 13430

generally it's a configuration option, "how big to lie about partition size".

Upvotes: 0

Related Questions