leonardo
leonardo

Reputation: 145

Python application using google pubsub getting killed by linux kernel, Out of memory

I am using pub/sub to publish logs from an IoT device to the cloud, where they're stored in cloud logging by a cloud function. This had been working fine, but now I am running into issues where the messages are not being delivered and eventually the application gets killed. This is the error message:

google.api_core.exceptions.RetryError: Deadline of 60.0s exceeded while calling functools.partial(<function _wrap_unary_errors.<locals>.error_remapped_callable at 0x7487bd20>, topic: "projects/projectid/topics/iot_device_logs"
messages {
  data: "20210612T04:09:22.116Z - ERROR - Failed to create objects on main "
  attributes {
    key: "device_num_id"
    value: "devincenumid"
  }
  attributes {
    key: "logger_name"
    value: "iotXX"
  }
}
, metadata=[('x-goog-request-params', 'topic=projects/projectid/topics/iot_device_logs'), ('x-goog-api-client', 'gl-python/3.7.3 grpc/1.33.2 gax/1.23.0 gccl/2.5.0')]), last exception: 503 Transport closed
20210612T04:21:08.211Z - INFO - QueryDevice object created
20210612T04:38:30.880Z - DEBUG - Analyzer failure counts
20210612T04:42:40.760Z - INFO - Attempting to query device 
20210612T04:48:05.126Z - DEBUG - Attempting to publish 'info' log on iotXX
bash: line 1:   609 Killed                  python3.7 path/to/file.py

The code in question is something like this:

def get_callback(self, f, log):
        def callback(f):
            try:
                self.debug(f"Successfully published log: {f.result()}")
            except Exception as e:
                self.debug(f"Failed to publish log: {e}")

        return callback

    def publish_log(self, log, severity):
        # data must be in bytestring
        data = log.encode("utf-8")

        try:
            self.debug(f"Attempting to publish '{severity}' log on {self.name}")
            # add two attributes to distinguish the log once in the cloud
            future = PUBLISHER.publish(
                TOPIC_PATH, data=data, logger_name=self.name, device_num_id=self.deviceid)
            futures[log] = future  
            # publish failures shall be handled in the callback function
            future.add_done_callback(self.get_callback(future, log))

        except Exception as e:
            self.debug(f"Error on publish_log: {e}")

I believe this is happening during a connection outage, which I can understand it might not be able to send the messages. However, I don't understand why the application is being killed.

So far, I am trying to change the retry settings to see if it improves. But I am concerned that the application will continue to get killed.

Any idea on how to determine why it is being killed instead of simply failing to send and continue on?

Upvotes: 0

Views: 386

Answers (1)

leonardo
leonardo

Reputation: 145

I seem to have found out the problem, and it is not what I was thinking. I am posting an answer in case someone else is confused by a similar problem and hopefully they're not misguided.

In my case, the connection problem coincided with my application being killed. But as far as I can tell, this was not the reason and pubsub or its retry settings had nothing to do with my application getting killed.

I found on the kernel logs a more descriptive message saying that the application had been killed by an out of memory reaper because it was consuming too much ram.

Turns out I had a memory leak on my program. I was not handling the futures generated by the pubsub publisher properly, so they kept adding up and consuming memory.

Upvotes: 1

Related Questions