Reputation: 145
I am using pub/sub to publish logs from an IoT device to the cloud, where they're stored in cloud logging by a cloud function. This had been working fine, but now I am running into issues where the messages are not being delivered and eventually the application gets killed. This is the error message:
google.api_core.exceptions.RetryError: Deadline of 60.0s exceeded while calling functools.partial(<function _wrap_unary_errors.<locals>.error_remapped_callable at 0x7487bd20>, topic: "projects/projectid/topics/iot_device_logs"
messages {
data: "20210612T04:09:22.116Z - ERROR - Failed to create objects on main "
attributes {
key: "device_num_id"
value: "devincenumid"
}
attributes {
key: "logger_name"
value: "iotXX"
}
}
, metadata=[('x-goog-request-params', 'topic=projects/projectid/topics/iot_device_logs'), ('x-goog-api-client', 'gl-python/3.7.3 grpc/1.33.2 gax/1.23.0 gccl/2.5.0')]), last exception: 503 Transport closed
20210612T04:21:08.211Z - INFO - QueryDevice object created
20210612T04:38:30.880Z - DEBUG - Analyzer failure counts
20210612T04:42:40.760Z - INFO - Attempting to query device
20210612T04:48:05.126Z - DEBUG - Attempting to publish 'info' log on iotXX
bash: line 1: 609 Killed python3.7 path/to/file.py
The code in question is something like this:
def get_callback(self, f, log):
def callback(f):
try:
self.debug(f"Successfully published log: {f.result()}")
except Exception as e:
self.debug(f"Failed to publish log: {e}")
return callback
def publish_log(self, log, severity):
# data must be in bytestring
data = log.encode("utf-8")
try:
self.debug(f"Attempting to publish '{severity}' log on {self.name}")
# add two attributes to distinguish the log once in the cloud
future = PUBLISHER.publish(
TOPIC_PATH, data=data, logger_name=self.name, device_num_id=self.deviceid)
futures[log] = future
# publish failures shall be handled in the callback function
future.add_done_callback(self.get_callback(future, log))
except Exception as e:
self.debug(f"Error on publish_log: {e}")
I believe this is happening during a connection outage, which I can understand it might not be able to send the messages. However, I don't understand why the application is being killed.
So far, I am trying to change the retry settings to see if it improves. But I am concerned that the application will continue to get killed.
Any idea on how to determine why it is being killed instead of simply failing to send and continue on?
Upvotes: 0
Views: 386
Reputation: 145
I seem to have found out the problem, and it is not what I was thinking. I am posting an answer in case someone else is confused by a similar problem and hopefully they're not misguided.
In my case, the connection problem coincided with my application being killed. But as far as I can tell, this was not the reason and pubsub or its retry settings had nothing to do with my application getting killed.
I found on the kernel logs a more descriptive message saying that the application had been killed by an out of memory reaper because it was consuming too much ram.
Turns out I had a memory leak on my program. I was not handling the futures generated by the pubsub publisher properly, so they kept adding up and consuming memory.
Upvotes: 1