Łukasz Kastelik
Łukasz Kastelik

Reputation: 649

Extracting Spark logs (Spark UI contents) from Databricks

I am trying to save Apache Spark logs (the contents of Spark UI), not necessarily stderr, stdout and log4j files (although they might be useful too) to a file so that I can send it over to someone else to analyze.

I am following the manual described in the Apache Spark documentation here: https://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact

The problem is that I am running the code on Azure Databricks. Databricks saves the logs elsewhere and you can display them from the web UI but cannot export it. When I ran the Spark job with spark.eventLog.dir set to a location in DBFS, the file was created but it was empty.

Is there a way to export the full Databricks job log so that anyone can open it without giving them the access to the workspace?

Upvotes: 1

Views: 1712

Answers (1)

Alex Ott
Alex Ott

Reputation: 87359

The simplest way of doing it as following:

  • You create a separate storage account + container in it or a separate container in existing storage account & give access to it to developers
  • You mount that container to the Databricks workspace
  • You configure clusters/jobs to write logs into mount location (you can enforce it for new objects using the cluster policies). This will create sub-directories with the cluster name, containing logs of driver & executors + result of execution of init scripts

enter image description here

  • (optional) you can setup retention policy on that container to automatically remove old logs.

Upvotes: 2

Related Questions