VB_
VB_

Reputation: 45692

Spark dataset custom partitioner

Could you please help me to find Java API for repartitioning sales dataset to N patitions of equal-size? By equal-size I mean equal number of rows.

Dataset<Row> sales = sparkSession.read().parquet(salesPath);
sales.toJavaRDD().partitions().size(); // returns 1

Upvotes: 1

Views: 3998

Answers (1)

TheGT
TheGT

Reputation: 412

AFAIK custom partitioners are not supported for Datasets. The whole idea of Dataset and Dataframe APIs in Spark 2+ is to abstract away the need to meddle with custom partitioners. And so if we face the need to deal with Data-skew and come to a point where custom partitioner is the only option, I guess we would go to lower level RDD manipulation.

For eg: Facebook use-case-study and Spark summit talk related to the use-case-study

For defining partitioners for RDDs, it is well documented in the API doc

Upvotes: 3

Related Questions