nojo
nojo

Reputation: 1065

Dataproc - setting driverLogLevels results in log4j error

I'm attempting to set driver log levels when launching jobs in Dataproc (https://cloud.google.com/dataproc/docs/reference/rest/v1/projects.regions.jobs#LoggingConfig). Launching is done via a Java program using the dataproc SDK.

LoggingConfig loggingConfig = new LoggingConfig();
loggingConfig.put("driverLogLevels", Collections.singletonMap("root", "ERROR"));

com.google.api.services.dataproc.model.SparkJob sparkJob = new com.google.api.services.dataproc.model.SparkJob().setMainClass(mainClass).setJarFileUris(jarFileUris).setArgs(args).setProperties(properties).setLoggingConfig(loggingConfig);

Job job = new Job().setPlacement(new JobPlacement().setClusterName(clusterName)).setSparkJob(sparkJob);

// ommitted irrelevant code

Dataproc dp = new Dataproc.Builder(httpTransport, jsonFactory, credential).setApplicationName(jobName).build();
SubmitJobRequest request = new SubmitJobRequest().setJob(job);
return dp.projects().regions().jobs().submit(googleProject, "global", request).execute();

This launches successfully, but does not successfully set log4j configuration:

log4j:ERROR Could not read configuration file from URL [file:/tmp/[guid]/driver_log4j.properties].
java.io.FileNotFoundException: /tmp/[guid]/driver_log4j.properties (No such file or directory)
    at java.io.FileInputStream.open0(Native Method)
    at java.io.FileInputStream.open(FileInputStream.java:195)
    at java.io.FileInputStream.<init>(FileInputStream.java:138)
    at java.io.FileInputStream.<init>(FileInputStream.java:93)
    at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:90)
    at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:188)
    at org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:557)
    at org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:526)
    at org.apache.log4j.LogManager.<clinit>(LogManager.java:127)
    at org.apache.spark.internal.Logging$class.initializeLogging(Logging.scala:117)
    at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:102)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.initializeLogIfNecessary(ApplicationMaster.scala:736)
    at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.log(ApplicationMaster.scala:736)
    at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:751)
    at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
log4j:ERROR Ignoring configuration file [file:/tmp/[guid]/driver_log4j.properties].
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

where [guid] is a GUID that differs for every job. Logging is by (verbose) default config.

How can I successfully set config? What is the most elegant and robust way on dataproc to adjust log levels for Spark? would be a fallback, but I'd rather use a method that's not liable to change out from under me.

Upvotes: 1

Views: 665

Answers (1)

Axel Magnuson
Axel Magnuson

Reputation: 1202

The official way to set log level is the method described in your link. See the dataproc docs.

so I believe that to invoke this from the java SDK within the setArgs(...) term of your builder. So in your case you would want to add:

args.add("--driver-log-levels");
args.add("root=ERROR");

like so:

args.add("--driver-log-levels");
args.add("root=ERROR");

com.google.api.services.dataproc.model.SparkJob sparkJob = new com.google.api.services.dataproc.model.SparkJob().setMainClass(mainClass).setJarFileUris(jarFileUris).setArgs(args).setProperties(properties).setLoggingConfig(loggingConfig);

Job job = new Job().setPlacement(new JobPlacement().setClusterName(clusterName)).setSparkJob(sparkJob);

// ommitted irrelevant code

Dataproc dp = new Dataproc.Builder(httpTransport, jsonFactory, credential).setApplicationName(jobName).build();
SubmitJobRequest request = new SubmitJobRequest().setJob(job);
return dp.projects().regions().jobs().submit(googleProject, "global", request).execute();

I'm not sure what you mean when you call this a feature that's liable to change out from under you. This should be a stable feature.

Upvotes: 1

Related Questions