Reputation: 59
import org.apache.spark.sql.functions.spark_partition_id
df.groupBy(spark_partition_id).count
Above example is not showing empty partitions.
Upvotes: 0
Views: 437
Reputation: 1944
Perhaps you can achieve this with mapPartitions
:
# We first coalesce to 5 partitions only for display purposes:
df.coalesce(5).mapPartitions(it => Iterator(it.size)).show
+-----+
|value|
+-----+
| 0|
| 0|
| 0|
| 0|
| 1|
+-----+
Upvotes: 1