Reputation: 1809
I have a spark (version 2.4.7) job,
JavaRDD<Row> rows = javaSparkContext.newAPIHadoopFile(...)
.map(d -> {
Foo foo = Foo.parseFrom(d._2.copyBytes());
String v1 = foo.getField1();
int v2 = foo.getField2();
double[] v3 = foo.getField3();
return RowFactory.create(v1, v2, v3);
})
spark.createDataFrame(rows, schema).createOrReplaceTempView("table1");
spark.sql("...sum/avg/group-by...").show();
Class Foo here is a complex Google protobuf class.
I have several questions:
Many thnaks.
Upvotes: 2
Views: 187