Rohan Ahire
Rohan Ahire

Reputation: 31

logging info/debug messages in google cloud apache beam python sdk

I wanted to print info or debug or exception messages on screen while running the dataflow program. I am able to do this when running the pipeline with the runner as "DirectRunner". But the same program does not print anything on the dataflow console while running with the runner "DataflowRunner". Here is the code, its very basic.

 import apache_beam as beam
 from apache_beam.io import ReadFromText
 from apache_beam.io import WriteToText
 from apache_beam.options.pipeline_options import PipelineOptions
 from apache_beam.options.pipeline_options import SetupOptions
 import logging

 class ProcessData(beam.DoFn):

   def process(self, element, var):
     logging.getLogger().setLevel(logging.INFO)
     logging.info("Print the element %s",element)
     logging.info("Print the var %s",var)

 logging.getLogger().setLevel(logging.INFO)
 #Initialize the pipeline
 pipeline_options = PipelineOptions()
 pipeline_options.view_as(SetupOptions).save_main_session = True
 p = beam.Pipeline(options=pipeline_options)

 p | 'Read the data file' >> beam.io.textio.ReadFromText('gs://rohan_staging/data/test.txt') | 'Process Data' >> beam.ParDo(ProcessData(),1)
 p.run()

I was able to see the messages earlier on the console, but suddenly i have stopped seeing them. I don't know what i did wrong or what was i doing different before. Please suggest how to i see info messages on the cloud dataflow console.

Upvotes: 0

Views: 10442

Answers (4)

Richa
Richa

Reputation: 1

I had a similar issue, and I was able to follow this guide to set up cloud logging in python. Essentially, you need to connect the python root logger to the cloud logging library for the logs to show up.

Upvotes: 0

xirix
xirix

Reputation: 340

For me the step logs don't appear in the Dataflow console. I need to go into Stackdriver and use an advanced filter:

resource.type="dataflow_step"
resource.labels.job_name="my job name"
resource.labels.step_id:"my step name"

This way I can see the log messages from my job step which were logged using the python logger.

Upvotes: 1

Stu S
Stu S

Reputation: 1

In case anyone else stumbles upon this. I'm still seeing this issue - but I saw change across time. That is, earlier in the day, I got the logs on the stackdriver pop-up, and later in the day, it stopped working.

It seems that, in general, the python beam runner is not maintained very well.

Upvotes: 0

dsesto
dsesto

Reputation: 8178

I see no issue in the snippet that you shared, in fact it is compliant with all the steps provided in the documentation to Log Pipeline Messages in Dataflow. Therefore, I have run a sample pipeline with your code and was able to verify that everything is being logged successfully (see logs Print the element... in screenshots 2 and 3):

  • Job logs: enter image description here

  • Logs at the Process Data step: enter image description here

  • Logs in Stackdriver Logging: enter image description here


As explained in the logging documentation I linked earlier, the Step Logs and Job Logs tab only show the most recent and relevant logs for the step or job, respectively, so you should go to the Stackdriver logs of your pipeline in order to have a complete view of your logs (which you can later filter based on your preferences).

Given that you said that you were able to see the logs earlier but not anymore, there are several things that can be happening:

  1. You were previously inspecting the Step logs (where the logs you added are shown), and now you are looking at the Job logs (which do not display them).
  2. The logs have disappeared from the Step logs tab, which only show recent logs.
  3. Your logs have expired from Stackdriver (as per the retention limits).

Upvotes: 3

Related Questions