Reputation: 59

Get the records count per partition in spark using dataframe without ignoring empty partition

import org.apache.spark.sql.functions.spark_partition_id

df.groupBy(spark_partition_id).count

Above example is not showing empty partitions.

Upvotes: 0

Answers (1)

qaziqarta

Reputation: 1944

Perhaps you can achieve this with mapPartitions:

# We first coalesce to 5 partitions only for display purposes:
df.coalesce(5).mapPartitions(it => Iterator(it.size)).show
+-----+
|value|
+-----+
|    0|
|    0|
|    0|
|    0|
|    1|
+-----+

Upvotes: 1

Get the records count per partition in spark using dataframe without ignoring empty partition

Answers (1)

Related Questions