Reputation:
I am currently using a function making extremely long dictionaries (used to compare DNA strings) and sometimes I'm getting MemoryError. Is there a way to allot more memory to Python so it can deal with more data at once?
Upvotes: 27
Views: 118036
Reputation: 173
Python has MemoryError which is the limit of your System RAM util you've defined it manually with resource
package.
Defining your class with slots makes the python interpreter know that the attributes/members of your class are fixed. And can lead to significant memory savings!
You can reduce dict creation by python interpreter by using __slot__
. This will tell interpreter to not create dict internally and reuse same variable.
If the memory consumed by your python processes will continue to grow with time. This seems to be a combination of:
The best way is to create Worker Thread or single threaded pool to do your work and invalidate worker/kill to free up resources attached/used in worker thread.
Below code creates single thread worker :
__slot__ = ('dna1','dna2','lock','errorResultMap')
lock = threading.Lock()
errorResultMap = []
def process_dna_compare(dna1, dna2):
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
futures = {executor.submit(getDnaDict, lock, dna_key): dna_key for dna_key in dna1}
'''max_workers=1 will create single threadpool'''
dna_differences_map={}
count = 0
dna_processed = False;
for future in concurrent.futures.as_completed(futures):
result_dict = future.result()
if result_dict :
count += 1
'''Do your processing XYZ here'''
logger.info('Total dna keys processed ' + str(count))
def getDnaDict(lock,dna_key):
'''process dna_key here and return item'''
try:
dataItem = item[0]
return dataItem
except:
lock.acquire()
errorResultMap.append({'dna_key_1': '', 'dna_key_2': dna_key_2, 'dna_key_3': dna_key_3,
'dna_key_4': 'No data for dna found'})
lock.release()
logger.error('Error in processing dna :'+ dna_key)
pass
if __name__ == "__main__":
dna1 = '''get data for dna1'''
dna2 = '''get data for dna2'''
process_dna_compare(dna1,dna2)
if errorResultMap != []:
''' print or write to file the errorResultMap'''
Below code will help you understand memory usage :
import objgraph
import random
import inspect
class Dna(object):
def __init__(self):
self.val = None
def __str__(self):
return "dna – val: {0}".format(self.val)
def f():
l = []
for i in range(3):
dna = Dna()
#print “id of dna: {0}”.format(id(dna))
#print “dna is: {0}”.format(dna)
l.append(dna)
return l
def main():
d = {}
l = f()
d['k'] = l
print("list l has {0} objects of type Dna()".format(len(l)))
objgraph.show_most_common_types()
objgraph.show_backrefs(random.choice(objgraph.by_type('Dna')),
filename="dna_refs.png")
objgraph.show_refs(d, filename='myDna-image.png')
if __name__ == "__main__":
main()
Output for memory usage :
list l has 3 objects of type Dna()
function 2021
wrapper_descriptor 1072
dict 998
method_descriptor 778
builtin_function_or_method 759
tuple 667
weakref 577
getset_descriptor 396
member_descriptor 296
type 180
More read on slots please visit : https://elfsternberg.com/2009/07/06/python-what-the-hell-is-a-slot/
Upvotes: 1
Reputation: 1686
Although Python doesn’t limit memory usage on your program, the OS system has a dynamic CPU and RAM limit for every program for a good power and performance of a whole machine.
I work on graphic fractal generate and it required as much billion characters array as possible for speed generate. And I found out there was a soft limit per one python program set by OS that decrease the real performance of the algorithms on machine.
When you increase RAM memory and CPU usage (buffer, loop, thread), the total speed of process decrease. The useful thread quantity and thread frequency decrease, it cause the result took longer time to process. But, if you reduced the resource usage to 50...75% of the old config (smaller buffer size, smaller loop, less threads, lower frequency or longher threading timer ) and split the task to multiple part then run multiple console python program to process all task parts in the same time. It will took much less time to finish, and when you check the CPU and RAM usage, it will reach much more higher the old method single python program multi thread.
It's mean when we make a program dealing with high performance and speed and giant data processing, we need to design the program that has multiple background program and each background program has multiple thread running. Also, combine disk space to the process, rather than use memory only.
Even when you reach max physical memory, dealing with giant data at once will slower and limit application than dealing with giant data by splitting it to several part.
Optional, if your program is a special application, please consider to:
By create multiple background programs, it's allow you to create high speed graphic application that not require user to install graphic card. This is also a good choice if you want your program easy distribute to end user that integrate AI and costly compute power. It's cheaper to upgrade RAM in the range 4-16GB than a Graphic card.
##//////////////////////////////////
##////////// RUNTIME PACK ////////// FOR PYTHON
##//
##// v2021.08.12 : add Handler
##//
##// Module by Phung Phan: [email protected]
##
##
import time;
import threading;
# var Handler = function(){this.post = function(r){r();};return this;}; Handler.post = function(r){r();}
ERUN=lambda:0;
def run(R): R();
def Run(R): R();
def RUN(R): R();
def delay(ms):time.sleep(ms/1000.0); # Control while speed
def delayF(R,delayMS):
t=threading.Timer(delayMS/1000.0,R)
t.start();
return t;
def setTimeout(R,delayMS):
t=threading.Timer(delayMS/1000.0,R)
t.start();
return t;
class THREAD:
def __init__(this):
this.R_onRun=None;
this.thread=None;
def run(this):
this.thread=threading.Thread(target=this.R_onRun);
this.thread.start();
def isRun(this): return this.thread.isAlive();
AInterval=[];
class setInterval :
def __init__(this,R_onRun,msInterval) :
this.ms=msInterval;
this.R_onRun=R_onRun;
this.kStop=False;
this.kPause=False;
this.thread=THREAD();
this.thread.R_onRun=this.Clock;
this.thread.run();
this.id=len(AInterval); AInterval.append(this);
def Clock(this) :
while not this.kPause :
this.R_onRun();
delay(this.ms);
def pause(this) :
this.kPause=True;
def stop(this) :
this.kPause=True;
this.kStop=True;
AInterval[this.id]=None;
def resume(this) :
if (this.kPause and not this.kStop) :
this.kPause=False;
this.thread.run();
def clearInterval(timer): timer.stop();
def clearAllInterval():
for i in AInterval:
if i!=null: i.stop();
def cycleF(R_onRun,msInterval):return setInterval(R_onRun,msInterval);
def stopCycleF(timer):
if not Instanceof(timer,"String"):
try: timer.stop();
except:pass;
##########
## END ### RUNTIME PACK ##########
##########
import subprocess;
def process1(): subprocess.call("python process1.py", shell=True);
def process2(): subprocess.call("python process2.py", shell=True);
def process3(): subprocess.call("python process3.py", shell=True);
def process4(): subprocess.call("python process4.py", shell=True);
setTimeout(process1,100);
setTimeout(process2,100);
setTimeout(process3,100);
setTimeout(process4,100);
Upvotes: 0
Reputation: 2161
Try to update your py from 32bit to 64bit.
Simply type python
in the command line and you will see which your python is. The memory in 32bit python is very low.
Upvotes: -1
Reputation: 402713
Python doesn’t limit memory usage on your program. It will allocate as much memory as your program needs until your computer is out of memory. The most you can do is reduce the limit to a fixed upper cap. That can be done with the resource
module, but it isn't what you're looking for.
You'd need to look at making your code more memory/performance friendly.
Upvotes: 34