Garfield
Garfield

Reputation: 436

how to use mapreduce when key is not in the first line : Hadoop Mapreduce

Can any one quide me how to solve the below using MapReduce in hadoop .

Lets say i have a file with the below structure

I want the output to be a concatenated string of key and value as below output

Upvotes: 0

Views: 113

Answers (1)

MaC
MaC

Reputation: 538

Yes, you can solve the problem in many ways depending on the structure and size of your data and files, maybe with a bit more of info we could give you a more accurate answer:

  1. Using the CombineFileInputFormat class if you have the same fields within the same file.
  2. Preventing splitting subclassing FileInputFormat and overriding isSplittable() method.

You could also check the class KeyValueTextInputFormat that allows you to read files line by line using something different than the line offset as the key. You can specify the separator (the comma) via the mapreduce.input.keyvaluelinerecordreader.key.value.separator

I hope it helped

Upvotes: 1

Related Questions