Ale
Ale

Reputation: 665

How to merge part files and headers in cloudera

I have a big table, that is generated in Hue with Pig Editor and contains some hundred thousand records. Pig returns some part files and separately .pig_header and .pig_schema files. I need to have all the part files and a header as one complete table in .txt format. I can do it with getmerge command:

-- To delete schema from output folder
    fs -rm /OUTPUT_folder/.pig_schema
--To merge all the part files and header from output folder and to save result in .txt file  
    fs -getmerge /OUTPUT_folder/* /Another_folder/Result.txt

I would like to ask if there is any way in Cloudera to get this complete table without using getmerge command?

Maybe there is a software in Cloudera or command that allows to combine part files at once.

And then i just need to open this table having, all the columns with headers in a ''nice- ordered way'', what is better to use for this goal in hue?

Upvotes: 0

Views: 813

Answers (1)

Romain
Romain

Reputation: 7082

You could try to do a final GROUP BY ALL and a ORDER BY follow by a FOREACH FLATTEN() that way all the records will go into a single reducers and so will be in only one file.

Upvotes: 0

Related Questions