Hadoop Load and Store

Question

When I am trying to run a Pig script which has two "store" to the same file this way

store Alert_Message_Count into 'out';
store Warning_Message_Count into 'out';

It hangs, I mean it does not proceed after showing 50% done.

Is this wrong? Cant we store both the results in the same file(folder)?

Chris White · Accepted Answer

Normally Hadoop MapReduce won't allow you to save job output to a folder that already exists, so i would guess that this isn't possible either (seeing as Pig translates the commands into a series of M/R steps) - but i would expect some form of error message rather than it just to hang.

If you open the cluster job tracker, and look at the logs for the task, does the log yield anything of note which can help diagnose this further?

Might also be worth checking with the pig mailing lists (if you haven't already)

If you want to append one dataset to another, use the union keyword:

grunt> All_Count = UNION Alert_Message_Count, Warning_Message_Count;
grunt> store All_Count into 'out';

Hadoop Load and Store

Answers (2)

Related Questions