Reputation: 31
I am new to Hadoop and I am learning the Map Reduce paradigm. In the tutorial I am following it is said that the map reduce approach tends to be apply two operataions (map and reduce) based on the Key-Value of the file. I know that hadoop deals also with unstructured data so I was wondering how it would handle map reduce in the case of unstructured data.
Upvotes: 0
Views: 91
Reputation: 191864
Take the example of the text
Hello
World
There are two lines of text, but there is naturally a key and a value, the file offset and the line itself. If you hex dump the file, you'd see something like so
0x0 Hello
0x6 World
This is how HDFS knows how to split plaintext files into blocks, and so mapreduce (and other runtime engines) can be used to read that data.
If you're storing video, images, audio, pdf documents, etc, then you must implement your own InputFormat reader to determine how the bytes of the file should be structured and parallelized, if at all
Upvotes: 0