Reputation: 1114
Is there a way to create parquet files from java?
I have data in memory (java classes) and I want to write it into a parquet file, to later read it from apache-drill.
Is there an simple way to do this, like inserting data into a sql table?
GOT IT
Thanks for the help.
Combining the answers and this link, I was able to create a parquet file and read it back with drill.
Upvotes: 24
Views: 63545
Reputation: 2923
ParquetWriter's constructors are deprecated(1.8.1) but not ParquetWriter itself, you can still create ParquetWriter by extending abstract Builder subclass inside of it.
Here an example from parquet creators themselves ExampleParquetWriter:
public static class Builder extends ParquetWriter.Builder<Group, Builder> {
private MessageType type = null;
private Map<String, String> extraMetaData = new HashMap<String, String>();
private Builder(Path file) {
super(file);
}
public Builder withType(MessageType type) {
this.type = type;
return this;
}
public Builder withExtraMetaData(Map<String, String> extraMetaData) {
this.extraMetaData = extraMetaData;
return this;
}
@Override
protected Builder self() {
return this;
}
@Override
protected WriteSupport<Group> getWriteSupport(Configuration conf) {
return new GroupWriteSupport(type, extraMetaData);
}
}
If you don't want to use Group and GroupWriteSupport(bundled in Parquet but purposed just as an example of data-model implementation) you can go with Avro, Protocol Buffers, or Thrift in-memory data models. Here is an example using writing Parquet using Avro:
try (ParquetWriter<GenericData.Record> writer = AvroParquetWriter
.<GenericData.Record>builder(fileToWrite)
.withSchema(schema)
.withConf(new Configuration())
.withCompressionCodec(CompressionCodecName.SNAPPY)
.build()) {
for (GenericData.Record record : recordsToWrite) {
writer.write(record);
}
}
You will need these dependencies:
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-avro</artifactId>
<version>1.8.1</version>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-hadoop</artifactId>
<version>1.8.1</version>
</dependency>
Full example here.
Upvotes: 26
Reputation: 3105
A few possible ways to do it:
text_table
. See Impala's Create Table Statement for details.parquet_table
.insert into parquet_table select * from text_table
to copy all data from the text file to the parquet table.Upvotes: 3