Reputation: 51

Read Avro with Spark in java

Can somebody share example of reading avro using java in spark? Found scala examples but no luck with java. Here is the code snippet which is part of code but running into compilation issues with the method ctx.newAPIHadoopFile.

JavaSparkContext ctx = new JavaSparkContext(sparkConf);
Configuration hadoopConf = new Configuration();
JavaRDD<SampleAvro> lines = ctx.newAPIHadoopFile(path, AvroInputFormat.class, AvroKey.class, NullWritable.class, new Configuration());

Regards

Upvotes: 2

Answers (2)

minyo

Reputation: 142

Here, assuming K is your Key and V is your value:

....

val job = new Job();

job.setInputFormatClass(AvroKeyValueInputFormat<K, V>.class);

FileInputFormat.addInputPaths(job, <inputPaths>);
AvroJob.setInputKeySchema(job, <keySchema>);
AvroJob.setInputValueSchema(job, <valueSchema>);

RDD<AvroKey<K>, AvroValue<V>> avroRDD = 
  sc.newAPIHadoopRDD(job.getConfiguration,
  AvroKeyValueInputFormat<<K>, <V>>,
  AvroKey<K>.class,
  AvroValue<V>.class);

Upvotes: 1

Leet-Falcon

Reputation: 2137

You can use the spark-avro connector library by Databricks.
The recommended way to read or write Avro data from Spark SQL is by using Spark's DataFrame APIs.

The connector enables both reading and writing Avro data from Spark SQL:

import org.apache.spark.sql.*;

SQLContext sqlContext = new SQLContext(sc);

// Creates a DataFrame from a specified file
DataFrame df = sqlContext.read().format("com.databricks.spark.avro")
    .load("src/test/resources/episodes.avro");

// Saves the subset of the Avro records read in
df.filter($"age > 5").write()
    .format("com.databricks.spark.avro")
    .save("/tmp/output");

Note that this connector has different versions for Spark 1.2, 1.3, and 1.4+:

Spark verconnector
1.2          0.2.0
1.3          1.0.0
1.4+        2.0.1

Using Maven:

<dependency>
    <groupId>com.databricks</groupId>
    <artifactId>spark-avro_2.10</artifactId>
    <version>{AVRO_CONNECTOR_VERSION}</version>
</dependency>

See further info at: Spark SQL Avro Library

Upvotes: 2

Read Avro with Spark in java

Answers (2)

Related Questions