Chetan Shirke
Chetan Shirke

Reputation: 906

How to rename files with specific extension using pig or hadoop fs option?

I have *.gz files in mm/dd/yyyy partition based folder structure . I want to rename files which ends with .gz extension.

Please suggest how to rename files with specific extension using hadoop fs command line option or using pig.

here is my folder structure

----root folder
    |
     ---year
        -- month
         -- day
          -- filename*.gz

I want to rename files with .gz extension. please suggest how to achieve this.

Upvotes: 1

Views: 1279

Answers (1)

Viacheslav Rodionov
Viacheslav Rodionov

Reputation: 2345

I know it's a dirty hack, but it works for me. I assume you want to change .gz file extension to .newextension:

hadoop fs -ls root/*/*/*/filename*.gz | grep .gz \
| awk '{print "hadoop fs -mv " $NF" "$NF}' | rev \ 
| cut -c 4-| rev | sed -e 's/$/newextension/'| bash

You may experiment by changing cut -c 4- part as you wish. And before you're ready to run it I suggest using file output instead of direct bash piping at the end:

hadoop fs -ls root/*/*/*/filename*.gz | grep .gz \
| awk '{print "hadoop fs -mv " $NF" "$NF}' | rev \ 
| cut -c 4-| rev | sed -e 's/$/newextension/' > rename_script.sh

and when you're satisfied with it, run it:

bash rename_script.sh

Upvotes: 1

Related Questions