Reputation: 1023
i am trying to delete hive stage files from spark using the below code. This code can delete files in a directory, but i want to delete all file starting with '.hive-staging_hive'.
Can i know the way to delete the directories starting with certain text.
Configuration conf = new Configuration();
System.out.println("560");
Path output = new Path("hdfs://abcd/apps/hive/warehouse/mytest.db/cdri/.hive-staging_hive_2017-06-08_20-45-20_776_7391890064363958834-1/");
FileSystem hdfs = FileSystem.get(conf);
System.out.println("564");
// delete existing directory
if (hdfs.exists(output)) {
System.out.println("568");
hdfs.delete(output, true);
System.out.println("570");
}
Upvotes: 1
Views: 1449
Reputation: 23109
The simple way is to run a process form Java program and use a wildcard to delete all the files starting with ".hive-staging_hive" in a directory.
String command="hadoop fs -rm pathToDirectory/.hive-staging_hive*";
int exitValue;
try {
Process process = Runtime.getRuntime().exec(command);
process.waitFor();
exitValue = process.exitValue();
}catch (Exception e) {
System.out.println("Cannot run command");
e.printStackTrace();
}
The next way is to list all files in the directories. Filter the files that starts with ".hive-staging_hive" and delete them.
Configuration conf = new Configuration();
Path path = new Path("hdfs://localhost:9000/tmp");
FileSystem fs = FileSystem.get(path.toUri(), conf);
FileStatus[] fileStatus = fs.listStatus(path);
List<FileStatus> filesToDelete = new ArrayList<FileStatus>();
for (FileStatus file: fileStatus) {
if (file.getPath().getName().startsWith(".hive-staging_hive")){
filesToDelete.add(file);
}
}
for (int i=0; i<filesToDelete.size();i++){
fs.delete(filesToDelete.get(i).getPath(), true);
}
Hope this helps!
Upvotes: 1