Reputation: 5188
I'm developing a data analysis worker in python using numpy and pandas. I will deploy lots of these workers so I want to keep it lightweight.
I tried checking with this code:
import logging
import resource
logging.basicConfig(level=logging.DEBUG)
def printmemory(msg):
currentmemory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
logging.debug(msg+': total memory:%r Mb' % (int(currentmemory)/1000000.))
printmemory('begin')
#from numpy import array, nan, mean, std, sqrt, square
import numpy as np
printmemory('numpy')
import pandas as pd
printmemory('numpy')
and I found out that simply loading them to memory will make my worker pretty heavy. Is there a way to reduce the memory footprint of numpy and pandas?
Otherwise, any suggestion on a better solution?
Upvotes: 7
Views: 817
Reputation: 32214
I am unsure of what problem you want to tackle, but if you need to paralellize numpy, maybe pycuda could be something for you. numpy and pandas works perfectly when being paralellized in cuda, but numpy would only be loaded once in memory, but fired in multiple processes in the graphics card. Read some more about it here: https://developer.nvidia.com/pycuda
Upvotes: 0
Reputation: 36
Sorry to tell you, but there is no way to load into memory only a part of a python module. You could use multi-threading if that applies to your case - threads can share the same module memory.
Upvotes: 2