Reputation: 13
I'm trying to learn a little bit more on big data particularly with regards to utilizing Hadoop and Spark. However, I keep seeing this term "intermediate results" and I am not quite sure what it is referring to.
For example, I read that "Hadoop writes intermediate results to a computer's storage disk, while Spark keeps those same results in memory whenever possible." I was assuming that this was referring to results after Map Reduce, but I am not quite sure.
Can someone go into a little bit more detail into what "intermediate results" are and how they may vary between Spark and Hadoop?
Upvotes: 0
Views: 395
Reputation: 191963
Between the map phase and the reduce phase, there is a shuffle and sort operation performed on the data being processed, which is intermediate to the whole operation
Upvotes: 0