Reputation: 1467
According to "The Definitive Guide to Hadoop", the input format TextInputFormat
gives key value pairs (k, v) = (byte offset, line)
. However, in MRJob, the key in the mapper input is always None
. It should be easy to get the byte offset as key, since that's what TextInputFormat does. How do I get this?
I know that you can use the environment variable 'map_input_start' and calculate byte offsets yourself, but this has caused problems and I would like to do it the much simpler way of just getting the offset as key.
Upvotes: 0
Views: 1017
Reputation: 603
Doesn't defining the map method in your mapper class with the following signature give you the byte offset as the key.
public void map(LongWritable key,Text value,OutputCollector<>,Reporter)
Upvotes: 0
Reputation: 10642
The TextInputFormat is a Java class ... I do not see how that would work in the streaming world.
Upvotes: 0