Reputation: 549
I see example of writing sequence file into hdfs using either org.apache.hadoop.fs
package or mapreduce. My questions are :
org.apache.hadoop.fs
to write sequence file, when I tried to use hadoop fs -text
to view result, I see the "key" still attached in each record/block? Would it be the same if I used mapreduce to produce the sequence file? I rather not to see the "key"Upvotes: 0
Views: 2680
Reputation: 913
For the sequence file you will write your content including the object i.e your own custom Object. While text file is just a string as each line.
Upvotes: 1
Reputation: 795
The Apache Hadoop Wiki states that "SequenceFile is a flat file consisting of binary key/value pairs". The Wiki shows the actual file format, that includes the key. Note that SequenceFiles support multiple formats, such as "Uncompressed", "Record Compressed", and "Block Compressed". Additionally there are various compression codecs that can be used. Since the file format and compression information is stored in the file header, applications (such as Mapper and Reducer tasks) can easily determine how to correctly process the files.
In the image below you can see that the append()
method on the org.apache.hadoop.io.SequenceFile.Writer
class requires both a key and a value:
Also keep in mind that both the MapReduce Mapper and Reducer ingest and emit key-value pairs. So having the key stored in the SequenceFile allows Hadoop top operate very efficiently with these types of files.
So in a nutshell:
Upvotes: 0