Engineiro
Engineiro

Reputation: 1146

Interpreting Cascading dot diagrams

Can someone explain how to read these diagrams? I understand the flow from head to tail, but I am specifically wondering about how to read the field (bracket) transitions between ellipses (Pipes/Taps).

By way of example using the Fields following the Every Pipe in the image, the way I have been able to interpret these is the first Field set i.e. [{2}:'token', 'count'] is what goes into the next Pipe/Tap, but what is the significance of the second Field set [{1}: 'token']?

Is this the field set that went into the previous Pipe above? Is there a programmatic significance to the second bracket i.e. are we able to access it within that pipe with particular Cascading code? (In the case where the second Fields set is greater than the first)

wc Impatient PNG
(source: cascading.org)

Upvotes: 2

Views: 340

Answers (1)

Brian Ethier
Brian Ethier

Reputation: 169

The second field set represents which fields are available for subsequent operations in that map or reduce.

In your example above, in the reduce step, since you grouped by 'token', only 'token' is available for subsequent aggregations (Everys) in that reduce step. You could, for example, add another aggregation which output the average token length, but you could not use an aggregation which utilized the 'count' yet.

The reason for this behaviour is that subsequent aggregations on the same group happen in parallel. Thus, the Count won't be completed to feed into any other aggregations you chained on.

Upvotes: 2

Related Questions