Reputation: 681
S3 and GCS are not block storage as opposite as HDFS so the way how Spark creates partitions when reading from these sources is not that clear to me. I am now reading from GCS but I get 2 partitions for small files (10 bytes), and also for medium files 100 MBs.
Has anyone an explanation?
Upvotes: 1
Views: 538
Reputation: 13430
generally it's a configuration option, "how big to lie about partition size".
Upvotes: 0