Reputation: 31
I have a sequence file generated by Spark using saveAsObjectFile function. File content is just some int numbers. And I want to read it locally with Java. Here is my code:
FileSystem fileSystem = null;
SequenceFile.Reader in = null;
try {
fileSystem = FileSystem.get(conf);
Path path = new Path("D:\\spark_sequence_file");
in = new SequenceFile.Reader(conf, SequenceFile.Reader.file(path));
Writable key = (Writable)
ReflectionUtils.newInstance(in.getKeyClass(), conf);
BytesWritable value = new BytesWritable();
while (in.next(key, value)) {
byte[] val_byte = value.getBytes();
int val = ByteBuffer.wrap(val_byte, 0, 4).getInt();
}
} catch (IOException e) {
e.printStackTrace();
}
But I can't read it correctly; I just get all the same values, and obviously they are wrong. Here is my answer snapshot
Can anybody help me?
Upvotes: 0
Views: 3043
Reputation: 7386
In Hadoop usually the Keys are of type WritableComparable and values are of type Writable. Keeping this basic concept in mind I read the Sequence File in the below way.
Configuration config = new Configuration();
Path path = new Path(PATH_TO_YOUR_FILE);
SequenceFile.Reader reader = new SequenceFile.Reader(FileSystem.get(config), path, config);
WritableComparable key = (WritableComparable) reader.getKeyClass().newInstance();
Writable value = (Writable) reader.getValueClass().newInstance();
while (reader.next(key, value))
// do some thing
reader.close();
The data issue in your case might be because of the reason you are using saveAsObjectFile()
rather than using saveAsSequenceFile(String path,scala.Option<Class<? extends org.apache.hadoop.io.compress.CompressionCodec>> codec)
Please try to use the above method and see if the issue persist.
Upvotes: 1