Luis A.G.
Luis A.G.

Reputation: 1097

Avoid Google Dataproc logging

I'm performing millions of operations using Google Dataproc with one problem, the logging data size. I do not perform any show or any other kind of print, but the 7 lines of INFO, multiplied by millions gets a really big logging size.

Is there any way to avoid Google Dataproc from logging?

Already tried without success in Dataproc:

https://cloud.google.com/dataproc/docs/guides/driver-output#configuring_logging

These are the 7 lines I want to get rid off:

18/07/30 13:11:54 INFO org.spark_project.jetty.util.log: Logging initialized @...

18/07/30 13:11:55 INFO org.spark_project.jetty.server.Server: ....z-SNAPSHOT

18/07/30 13:11:55 INFO org.spark_project.jetty.server.Server: Started @...

18/07/30 13:11:55 INFO org.spark_project.jetty.server.AbstractConnector: Started ServerConnector@...

18/07/30 13:11:56 INFO com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase: GHFS version: ...

18/07/30 13:11:57 INFO org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at ...

18/07/30 13:12:01 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_...

Upvotes: 0

Views: 495

Answers (1)

Lefteris S
Lefteris S

Reputation: 1672

What you are looking for is an exclusion filter: you need to browse from your Console to Stackdriver Logging > Logs ingestion > Exclusions and click on "Create exclusion". As explained there:

To create a logs exclusion, edit the filter on the left to only match logs that you do not want to be included in Stackdriver Logging. After an exclusion has been created, matched logs will no longer be accessible in Stackdriver Logging.

In your case, the filter should be something like this:

resource.type="cloud_dataproc_cluster"
textPayload:"INFO org.spark_project.jetty.util.log: Logging initialized"
...

Upvotes: 2

Related Questions