How to transform particular code piece from Spark 1.6.2 to Spark 2.2.0?

Question

I need to pass my Spark 1.6.2 code to Spark 2.2.0 in Java.

 DataFrame eventsRaw = sqlContext.sql("SELECT * FROM my_data");
 Row[] rddRows = eventsRaw.collect();
 for (int rowIdx = 0; rowIdx < rddRows.length; ++rowIdx)
 {
     Map myProperties = new HashMap<>();
     myProperties.put("startdate", rddRows[rowIdx].get(1).toString());
     JEDIS.hmset("PK:" + rddRows[rowIdx].get(0).toString(), myProperties); // JEDIS is a Redis client for Java
 }

As far as I understand, there is no DataFrame in Spark 2.2.0 for Java. Only Dataset. However, if I substitute DataFrame with Dataset, then I get Object[] instead of Row[] as output of eventsRaw.collect(). Then get(1) is marked in red and I cannot compile the code.

How can I correctly do it?

Alper t. Turker · Accepted Answer

DataFrame (Scala) is Dataset:

SparkSession spark;

...

Dataset eventsRaw = spark.sql("SELECT * FROM my_data");

but instead of collect you should rather use foreach (use lazy singleton connection) :

eventsRaw.foreach(
   (ForeachFunction) row -> ... // replace ... with appropriate logic
);

or foreachPartition (initialize connection for each partition):

eventsRaw.foreachPartition((ForeachPartitionFunction {
   ... // replace ... with appropriate logic
})

How to transform particular code piece from Spark 1.6.2 to Spark 2.2.0?

Answers (1)

Related Questions