Spark 2.0 create data frame from foreach

Question

Is it possible to execute foreach on a dataframe so that I can return a dataset? I have a requirement that can only be satisfied by processing the records in order, so I am using foreach over the dataframe, but I need to create a new dataset from the result so I can write it into a parquet output file. This pseudo-code is what I would like to accomplish:

dataframe.foreachPartition(
  it => {
  /// process records . . .
  /// write the results form this partition into a file for aggregation later
      sparkSession.write . . .
  }
);
// read a dataframe containing all the data sets written by the tasks
sparkSession.read . . .

I know that is pretty sparse, but that summarizes what I need to do. The call to sparkSession.write is not allowed inside the foreach so I am wondering if there is another way.

Spark 2.0 create data frame from foreach

Answers (1)

Related Questions