Reputation: 136177
I would like to log in a production system the current memory usage of a Python script. AWS has Container Insights, but they are extremely well-hidden and I'm not sure how to use them properly within other dashboards / logging- and altering systems. Also, I'm not certain if the log peak memory at all.
The Python script is the production system. It is running on AWS within a Docker container and I ran into issues with a previous approach (link).
tracemalloc seems to be able to to give me the information I want:
# At the start of the script
import tracemalloc
tracemalloc.start()
# script running...
# At the end
current, peak = tracemalloc.get_traced_memory()
logger.info(f"Current memory usage is {current / 10**6} MB")
logger.info(f"Peak memory usage was {peak / 10**6} MB")
tracemalloc.stop()
However, the docs state:
The tracemalloc module is a debug tool
So would it be a bad idea to wrap this around production code? How much overhead is it? Are there other reasons not to use that in production?
(I have a pretty good idea of which parts of the code need most memory and where the peak memory is reached. I want to monitor that part (or maybe rather the size of those few objects / few lines of code). The alternative to tracemalloc seems to be to use something like this)
Upvotes: 4
Views: 3217
Reputation: 11
Ran into this problem while working with a large dataframe and can confirm the 3-4x performance hit.
In my tests, the following snippet ran in 41-42s:
import pandas as pd
def preprocess():
data = pd.read_csv('<filename>')
# additional preprocessing logic
But just importing tracemalloc
increased that time to >120s:
import tracemalloc
import pandas as pd
def preprocess():
data = pd.read_csv('<filename>')
# additional preprocessing logic
However, per this article, you can profile resident memory with psutil
:
import psutil
psutil.Process().memory_info().rss / 1024**2 # B -> MiB
And doing so incurs no hit to performance.
Upvotes: 1
Reputation: 168
tracemalloc wraps the regular memory allocator with its own allocation function to perform additional bookkeeping for memory allocations/de-allocations. This happens on every memory allocation call so if you are creating a large amount of objects, this will significantly slow down your Python code.
The 3x-4x memory size increase is likely because tracemalloc keeps a hashtable lookup of every memory allocation trace.
Upvotes: 0
Reputation: 494
I've been trying to answer the same question. The best answer I've found is from
https://www.mail-archive.com/[email protected]/msg443129.html
which quotes a factor of 3-4 increase memory usage with tracemalloc based on a simple experiment.
Upvotes: 4