Spark Structure Streaming job failing in cluster mode

Question

I am using spark-sql-2.4.1 v in my application.

While writing data on to hdfs folder I am facing this issue in spark-streaming application

Error:

    yarn.Client: Deleted staging directory hdfs://dev/user/xyz/.sparkStaging/application_1575699597805_47
    20/02/24 14:02:15 ERROR yarn.Client: Application diagnostics message: User class threw exception: org.apache.hadoop.security.AccessControlException: Permission denied: user= xyz, access=WRITE, inode="/tmp/hadoop-admin":admin:supergroup:drwxr-xr-x
    .
    .
    .
    Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=xyz, access=WRITE, inode="/tmp/hadoop-admin":admin:supergroup:drwxr-xr-x
            at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:350)
            at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:251)

While writing data on to HDFS folder I am facing this issue in spark-streaming application. When I run in yarn-cluster mode I face this issue i.e.

    --master yarn \
    --deploy-mode cluster \

But when I run in “yarn-client” mode it runs fine i.e.

    --master yarn \
    --deploy-mode client \

What is the root cause of this problem?

Fundamental question here, why it is trying to write in "/tmp/hadoop-admin/" instead of respective user directory i.e. hdfs://qa2/user/xyz/?

I have come across this fix:

https://issues.apache.org/jira/browse/SPARK-26825

How can I implement it in my spark-sql application?

Spark Structure Streaming job failing in cluster mode

Answers (1)

Related Questions