Spark event log not able to write to s3

Question

I am trying to write the eventlog of my spark application to s3 for consuming through the history server later , but i get below warning message in the log

WARN S3ABlockOutputStream: Application invoked the Syncable API against stream writing to /spark_logs/eventlog_v2_local-1671627766466/events_1_local-1671627766466. This is unsupported

Below is the spark config I used:

config("spark.eventLog.enabled", "true")\
    .config("spark.eventLog.dir", 's3a://change-data-capture-cdc-test/pontus_data_load/spark_logs')\
    .config("spark.eventLog.rolling.enabled", "true")\
    .config("spark.eventLog.rolling.maxFileSize", "10m")

spark version - 3.3.1
dependant jars:
- org.apache.hadoop:hadoop-aws:3.3.0
- com.amazonaws:aws-java-sdk-bundle:1.11.901

Only the appstatus_local-1671627766466.inprogress file is created, the actual log file is not created. But with my local file system its working as expected.

stevel · Accepted Answer

the warning means "the Application invoked the Syncable API against stream writing to /spark_logs/eventlog_v2_local-1671627766466/events_1_local-1671627766466. This is unsupported"

application code persists data to a filesystem using sync() to flush and save. clearly the spark logging is calling this. And as noted, the s3a client says "no can do".

s3 is not a filesystem. it is an object store; objects are written in single atomic operations. If you look at the S3ABlockOutputStream class -it is all open source after all- you can see that it may upload data, but it only completes the write in close().

therefore, it is not visible during the logging process itself. The warning is to make clear this is happening. It will appear once the log is closed.

If you want, you can set spark.hadoop.fs.s3a.downgrade.syncable.exceptions to true and it will raise an exception instead. That really makes clear to applications like hbase that the filesystem lacks the semantics it needs.

Spark event log not able to write to s3

Answers (1)

Related Questions