Is it possible to run multiple aggregation jobs on a single dataframe in parallel in spark?

Question

Is there any way to run multiple independent aggregation jobs on a single RDD in parallel? First preference is Python then Scala and Java.

The course of actions in order of preference are -

Using Threadpool - run different functions doing different aggregations on different threads. I did not see an example which does this.
Using cluster mode on yarn , submitting different jars. Is this possible , if yes then is it possible in pyspark?
Using Kafka - run different spark-submits on the dataframe streaming through kafka.

I am quite new to Spark , and my experience ranges on running Spark on Yarn for ETL doing multiple aggregations serially. I was thinking if it was possible to run these aggregations in parallel as they are mostly independent.

Is it possible to run multiple aggregation jobs on a single dataframe in parallel in spark?

Answers (1)

Related Questions