Marko Galesic
Marko Galesic

Reputation: 512

Pig : Load in table, then overwrite that table after transformation

Let's say I have a table:

db.table

I load the table and do some transforms on it, and, finally, attempt to store it

mytable = LOAD 'db.table' USING HCatLoader();

.
.
-- My transforms
.
.

STORE mytable_final INTO 'db.table' USING HCatStorer();

But the code complains I'm writing into a table with existing data.

I've looked at this JIRA ticket, which seems to be inactive (I have tried using FORCE and OVERWRITE in several places in the STORE command).

I've also looked at this SO post, but the author is loading from one location and storing in a different location. If I use what is in that post, the result from the transformation is no data. Deleting the files isn't an option. I'm thinking of storing the files temporarily, but I don't know if this is the best option.

I am trying to get the behavior you get in Hive using INSERT OVERWRITE.

Upvotes: 1

Views: 1623

Answers (1)

reo katoa
reo katoa

Reputation: 5811

I am not familiar with HCatLoader and HCatStorer. But if you LOAD from and STORE to HDFS, Pig provides shell commands that enable you to do the deleting and moving from within your script.

STORE A INTO '/this/path/is/temporary';
RMF '/this/path/is/permanent';
MV '/this/path/is/temporary' '/this/path/is/permanent';

Upvotes: 2

Related Questions