Reputation: 33293
I have two datasets.. main_data.txt
{"id":"foo", "some_field:12354, "score":0}
{"id":"foobar", "some_field:12354, "score":0}
score_data.txt
{"id":"foo", "score":1}
{"id":"foobar","score":20}
....
So in main_data.. score is initialized to 0.. Also.. main_data and score_data have some ids in common..
For the ids which are common: I want to replace "score" in main_data with score in score_data
And if the element is absent.. then I want to let the score to 0 itself..
Upvotes: 0
Views: 79
Reputation: 34673
Why do you have "score" initialized to 0? You could simply skip that, join main_data
(LEFT OUTER) and score_data
. Regardless if you skip or not, this should work:
main_data = LOAD USING SOME STORAGE; -- asume we have id as column
score_data = LOAD USING SOME STORAGE; -- asume we have id, score as columns
joined_data = JOIN main_data BY main_data::id LEFT OUTER, score_data BY score_data::id;
results = FOREACH joined_data GENERATE main_data::id, (score_data::score IS NULL ? 0 : score_data::score);
STORE results USING SOMETHING SOMEWHERE;
Upvotes: 1