Lianshuai
Lianshuai

Reputation: 31

How to read Hadoop sequence file using Java

I have a sequence file generated by Spark using saveAsObjectFile function. File content is just some int numbers. And I want to read it locally with Java. Here is my code:

    FileSystem fileSystem = null;
    SequenceFile.Reader in = null;
    try {
        fileSystem = FileSystem.get(conf);
        Path path = new Path("D:\\spark_sequence_file");
        in = new SequenceFile.Reader(conf, SequenceFile.Reader.file(path));
        Writable key = (Writable)
                ReflectionUtils.newInstance(in.getKeyClass(), conf);
        BytesWritable value = new BytesWritable();
        while (in.next(key, value)) {
            byte[] val_byte = value.getBytes();
            int val = ByteBuffer.wrap(val_byte, 0, 4).getInt();
        }
    } catch (IOException e) {
        e.printStackTrace();
    }

But I can't read it correctly; I just get all the same values, and obviously they are wrong. Here is my answer snapshot

enter image description here

The file head is like this: enter image description here

Can anybody help me?

Upvotes: 0

Views: 3043

Answers (1)

wandermonk
wandermonk

Reputation: 7386

In Hadoop usually the Keys are of type WritableComparable and values are of type Writable. Keeping this basic concept in mind I read the Sequence File in the below way.

Configuration config = new Configuration();
Path path = new Path(PATH_TO_YOUR_FILE);
SequenceFile.Reader reader = new SequenceFile.Reader(FileSystem.get(config), path, config);
WritableComparable key = (WritableComparable) reader.getKeyClass().newInstance();
Writable value = (Writable) reader.getValueClass().newInstance();
while (reader.next(key, value))
  // do some thing
reader.close();

The data issue in your case might be because of the reason you are using saveAsObjectFile() rather than using saveAsSequenceFile(String path,scala.Option<Class<? extends org.apache.hadoop.io.compress.CompressionCodec>> codec)

Please try to use the above method and see if the issue persist.

Upvotes: 1

Related Questions