Reputation: 99
I want to remove all files from hdfs location except one, But unable to find any solution for it.
i have tried shopt -s extglob
then hadoop fs -rm location/!(filename)
but it did not work.
Upvotes: 3
Views: 3198
Reputation: 11
I came up with a solution following vikrant rana's one. It does not require rm command to execute multiple times, and also doesn't need to store the files in any array, reducing loc and efforts:
hadoop fs -ls /user/xxxx/dev/hadoop/external/csvfiles| grep -v 'a_file_pattern_to_search' | awk '{print $8}' | xargs hadoop fs -rm
Upvotes: 1
Reputation: 99
Using the following code i'am able to remove all files from hdfs location at once except the file which is needed.
file_arr=()
for file in $(hadoop fs -ls /tmp/table_name/ | grep -v 'part-' | awk '{print $8}')
do
file_arr+=("$file")
done
hadoop fs -rm "${file_arr[@]}"
Upvotes: 0
Reputation: 4674
A best option would be to copy specific file to some other directory and delete all the remaining files in target directory and then move specific file to the same directory.
Else, There are couple of other ways as well to do the same thing.
Below is one sample shell script to delete all the files expect one matching pattern.
#!/bin/bash
echo "Executing the shell script"
for file in $(hadoop fs -ls /user/xxxx/dev/hadoop/external/csvfiles |grep -v 'a_file_pattern_to_search' | awk '{print $8}')
do
printf '\n' >> "$file"
hadoop fs -rm "$file"
done
echo "shell scripts ends"
List all the files and then using grep with -v option which get all the files other than your specific pattern or a filename.
Upvotes: 2