Reputation: 83
A Python 2.7 program called 'eventcollector' runs continuously and polls a webservice for events. It then appends each event as a JSON object to the end of a file - /var/log/eventsexample.json. An Agent follows the file and sends the events up to cloud based software called 'anycloud' that processes the events.
I need to make eventcollector a well behaved UNIX daemon and then make that daemon a service in systemd. The systemd .service unit I will create for this purpose will let systemd know that when stopping this service it must wait 15 seconds after sending SIGTERM before sending SIGKILL. This will give eventcollector time to save state and close the files it is writing (its own log file and the event file). awill be configured to I must now make this program more resilient. The program must be able to save its state so that when it is terminated and restarted, the program knows where it left off.
Eventcollector has no visibility into anycloud. it can only see events in the source service. If Eventcollector dies becuase of a restart, it must reliabilty know what its new start_time is to query the source service for events. Therefore finishing the critical business of writing events to the file before exiting and saving state is critical.
My question is specifically about how to handle the SIGTERM such that the program has time to finish what it is doing and then save its state.
My concern however, is that unless I write state after every message I write to the file (this would consume more resources than seems necessary), I cannot be sure my program won't be terminated without saving state in time. The impact of this would be duplicate messages, and duplicate messages are not acceptable.
If I must take the performance hit, I will, but I would prefer to have a way to handle a SIGTERM gracefully such that the program can smartly do the following for example (simplified pseudocode excerpt):
while true:
response = query the webservice using method returning
a list of 100 dictionaries (events)
for i in response.data:
event = json.dumps(i)
outputfile.write(i) #< Receive SIGTERM during 2nd event, but do not
exit until the for loop is done. (how?)
signal handler:
pickle an object with the current state.
The idea is that even if the SIGTERM were received while the 2nd event is being written, the program would wait until it had written the 100th event before deciding it is safe to handle the SIGTERM.
I read in https://docs.python.org/2/library/signal.html:
There is no way to “block” signals temporarily from critical sections (since this is not supported by all Unix flavors).
One idea I had seemed too complex, and it seemed to me that there must be an easier way. the Idea was:
I'm considering using python-daemon which I understand to be Ben Finney's reference implementation of the PEP he wrote [PEP 3143](https://www.python.org/dev/peps/pep-3143/>. I understand based on what he has written and also on what I have seen from my experiences with UNIX and UNIXlike OSes that what constitutes "good behavior" on the part of a daemon is not agreed upon. I mention this because, I do agree with PEP 3143, and would like to implement this, however it does not answer my current question about how to deal with signals as I would like to do.
Upvotes: 0
Views: 852
Reputation: 504
your daemon was in python 2.7
and python is not so convenient to use when making syscalls, bad for /dev/shm
, semaphores
and i do not sure about side effects and caveats in using global variables
in python
file lock
is fragile and file system IO is bad for signal handlers
so i do not have a perfect answer , only ideas.
here was my idea when i was implementing a small daemon in C
Upvotes: 1