loolkzey
loolkzey

Reputation: 13

What do people mean by "intermediate results" when talking about Hadoop, Spark, and Big Data?

I'm trying to learn a little bit more on big data particularly with regards to utilizing Hadoop and Spark. However, I keep seeing this term "intermediate results" and I am not quite sure what it is referring to.

For example, I read that "Hadoop writes intermediate results to a computer's storage disk, while Spark keeps those same results in memory whenever possible." I was assuming that this was referring to results after Map Reduce, but I am not quite sure.

Can someone go into a little bit more detail into what "intermediate results" are and how they may vary between Spark and Hadoop?

Upvotes: 0

Views: 395

Answers (1)

OneCricketeer
OneCricketeer

Reputation: 191963

Between the map phase and the reduce phase, there is a shuffle and sort operation performed on the data being processed, which is intermediate to the whole operation

Upvotes: 0

Related Questions