Katya Willard
Katya Willard

Reputation: 2182

Check if a file exists in HDFS from Python

So, I've been using the fabric package in Python to run shell scripts for various HDFS tasks.

However, whenever I run tasks to check if a file / directory already exists in HDFS, it simply quits the shell. Here is an example (I am using Python 3.5.2 and Fabric3==1.12.post1)

from fabric.api import local


local('hadoop fs -stat hdfs://some/nonexistent/hdfs/dir/')

If the directory does not exist, this code yields

[localhost] local: hadoop fs -stat hdfs://some/nonexistent/hdfs/dir/ stat: `hdfs://some/nonexistent/hdfs/dir/': No such file or directory

Fatal error: local() encountered an error (return code 1) while executing 'hadoop fs -stat hdfs://some/nonexistent/hdfs/dir/'

Aborting.

I also tried local('hadoop fs -test -e hdfs://some/nonexistent/hdfs/dir/') but it caused the same issue.

How can I use fabric to generate a boolean variable that will tell me whether or not a directory or file exists in hdfs?

Upvotes: 5

Views: 7680

Answers (1)

2ps
2ps

Reputation: 15926

You can just check the succeeded flag of the result object returned from local.

from fabric.api import local
from fabric.context_managers import settings

file_exists = False
with settings(warn_only=True):
    result = local('hadoop fs -stat hdfs://some/nonexistent/hdfs/dir/', capture=True)
    file_exists = result.succeeded

Upvotes: 1

Related Questions