Reputation: 65
Currently I use the following code to append to an existing SequenceFile:
// initialize sequence writer
Writer writer = SequenceFile.createWriter(
FileContext.getFileContext(this.conf),
this.conf,
new Path("/tmp/sequencefile"),
Text.class,
BytesWritable.class,
CompressionType.NONE,
null,
new Metadata(),
EnumSet.of(CreateFlag.CREATE, CreateFlag.APPEND),
CreateOpts.blockSize(64 * 1024 * 1024));
writer.append(key, value);
// close writer
writer.hsync();
writer.close();
Everything works if the sequencefile not exists, but when the file exists Hadoop write the SequenceFile header (SEQ ...) again in the middle of the file and the file is unreadble for Hadoop.
I use Hadoop 2.6.0
Upvotes: 2
Views: 1936
Reputation: 396
I think that it is not possible to append to existing sequence file. I've analyzed the source code of 2.5.2 and 2.6.0-CDH5.5. In every constructor of Writer "Sequence file header" is written (from the init function).
There is path to support this feature https://issues.apache.org/jira/browse/HADOOP-7139 but it is not pushed to oficial release.
UPDATE: The issue HADOOP-7139 now is closed and from version 2.6.1 / 2.7.2 It's possible to append to an existing SequenceFile :)
Upvotes: 1