Fra
Fra

Reputation: 5188

Reduce memory footprint of python program

I'm developing a data analysis worker in python using numpy and pandas. I will deploy lots of these workers so I want to keep it lightweight.

I tried checking with this code:

import logging
import resource
logging.basicConfig(level=logging.DEBUG)

def printmemory(msg):
    currentmemory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
    logging.debug(msg+': total memory:%r Mb' % (int(currentmemory)/1000000.))

printmemory('begin')

#from numpy import array, nan, mean, std, sqrt, square
import numpy  as np
printmemory('numpy')

import pandas  as pd
printmemory('numpy')

and I found out that simply loading them to memory will make my worker pretty heavy. Is there a way to reduce the memory footprint of numpy and pandas?

Otherwise, any suggestion on a better solution?

Upvotes: 7

Views: 817

Answers (2)

firelynx
firelynx

Reputation: 32214

I am unsure of what problem you want to tackle, but if you need to paralellize numpy, maybe pycuda could be something for you. numpy and pandas works perfectly when being paralellized in cuda, but numpy would only be loaded once in memory, but fired in multiple processes in the graphics card. Read some more about it here: https://developer.nvidia.com/pycuda

Upvotes: 0

eran
eran

Reputation: 36

Sorry to tell you, but there is no way to load into memory only a part of a python module. You could use multi-threading if that applies to your case - threads can share the same module memory.

Upvotes: 2

Related Questions