Reputation: 3077
I am trying to convert my Json file to Parquet format.
Following is my pom file.
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.mypackage</groupId>
<artifactId>JSONToParquet</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>jar</packaging>
<repositories>
<repository>
<id>wso2</id>
<url>http://dist.wso2.org/maven2/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.kitesdk</groupId>
<artifactId>kite-data-core</artifactId>
<version>1.1.0</version>
</dependency>
<dependency>
<groupId>org.kitesdk</groupId>
<artifactId>kite-morphlines-all</artifactId>
<version>1.0.0</version> <!-- or whatever the latest version is -->
<type>pom</type>
</dependency>
<!-- https://mvnrepository.com/artifact/ua_parser/ua-parser -->
<dependency>
<groupId>ua_parser</groupId>
<artifactId>ua-parser</artifactId>
<version>1.3.0</version>
<type>pom</type>
</dependency>
</dependencies>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
</properties>
</project>
Following is the code for conversion :
Schema jsonSchema = JsonUtil.inferSchema(inputstream, "Movie", 10);
try (JSONFileReader<Movie> reader = new JSONFileReader<>(
inputstream, jsonSchema, Movie.class)) {
reader.initialize();
ParquetWriter parquetWriter
= new AvroParquetWriter(outputPath, jsonSchema, compressionCodecName, ParquetWriter.DEFAULT_BLOCK_SIZE, ParquetWriter.DEFAULT_PAGE_SIZE);
for (Movie record : reader) {
parquetWriter.write(record);
}
In the above code Movie
is my POJO class.
When I run the program I am facing the following exception :
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/RecordReader
at com.mypackage.jsontoparquet.JsonToParquet.main(JsonToParquet.java:34)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.RecordReader
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 1 more
I am using JDK : 8.
I don't have any background of hadoop, so I am unable to understand it's root cause.
What is the issue ?
Upvotes: 0
Views: 1814
Reputation: 58832
Your kite is missing hadoop dependencies
there are some cases where you may have to provide the relevant Hadoop component dependencies yourself, and Kite has grouping dependencies for this purpose.
For Haddop2 (default) add to your pom:
<dependency>
<groupId>org.kitesdk</groupId>
<artifactId>kite-hadoop2-dependencies</artifactId>
<version>1.0.0</version>
<type>pom</type>
<scope>compile</scope>
</dependency>
Upvotes: 0
Reputation: 1360
Based on Kite-SDK Documentation, JSONFileReader
,ParquetWriter
and AvroParquetWriter
use Hadoop libraries to work. It is needed to add Hadoop dependencies in your pom. You need at least below dependencies. Add them in your pom.xml:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-jobclient</artifactId>
<version>2.6.0</version>
</dependency>
Upvotes: 2