How to cache imported modules once off for multiple runs of a python script?

Question

I have a python script that I would like to run periodically on a 1 minute basis using a cron job. The script imports some python modules and config files each time it is run. The issue is, there is some large overhead (1-2 minutes) with all the imports. (The modules and files are relatively small; the total size is only 15 MB, so can easily fit in memory).

Once everything is imported, the rest of the script runs relatively quickly (about 0.003 seconds; it's not computationally demanding).

Is it possible to cache all the imports, once, the very first time the script is run, so that all subsequent times the script is run there is no need to import the modules and files again?

Marius Mucenicu · Accepted Answer

No, you can't. You would have to use persistent storage, like shelve, or something in-memory such as SQLite, where you'd store any expensive computations that persist between sessions, and subsequently you'd just read those results from memory/disk, depending on your chosen storage.

Moreover, do note modules are, in fact, being cached upon import in order to improve load time, however, not in memory, but rather on disk, as .pyc files under __pychache__ and the import time per se is insignificant in general, so your imports take that long not because of the import itself, but because of the computations inside those modules, so you might want to optimise those.

The reason you can't do what you want is because in order to keep data in memory, the process must keep running. Memory belongs to the process running the script, and once that script finished, the memory is freed. See here for additional details regarding your issue.

You can't just run a script and fill the memory with whatever computations you have until you might run it another time, because, firstly, the memory wouldn't know when that other time would be (it might be 1 min later, it might be 1 year later) and secondly, if you would be able to do that, then imagine how shortly you'd run out of memory when different scripts from different applications across the OS (it's not just your program out there) would fill the memory with the results of their computations.

So you can either run your code in an indefinite loop with sleep (and keep the process active) or you can use a crontab and store your previous results somewhere.

How to cache imported modules once off for multiple runs of a python script?

Answers (1)

Related Questions