badc0re
badc0re

Reputation: 3523

memory usage, how to free memory

I am using python and when indexing documents (for search engine) it takes a lot of RAM, after i stop the indexing process the memory is still full (like 8gb of RAM). This is bad because i need my search engine to work all the time and not to reset the OS when i finished indexing. Is there any efficient way how to manage with huge arrays,dictionaries and lists, and how to free them. Any ideas?

I saw also some questions about it on stackoverflow, but they are old:

Python memory footprint vs. heap size

Profile Memory Allocation in Python (with support for Numpy arrays)

Info:

free -t
             total       used       free     shared    buffers     cached
Mem:          5839       5724        114          0         15       1011
-/+ buffers/cache:       4698       1141
Swap:         1021        186        835
Total:        6861       5910        950


top | grep python 

 3164 root      20   0 68748  31m 1404 R   17  0.5  53:43.89 python                                                                     
 6716 baddc0re  20   0 84788  30m 1692 S    0  0.5   0:06.81 python     

 ps aux | grep python

root      3164 57.1  0.4  64876 29824 pts/0    R+   May27  54:23 python SE_doc_parse.py
baddc0re  6693  0.0  0.2  53240 16224 pts/1    S+   00:46   0:00 python index.py

uptime

01:02:40 up  1:43,  3 users,  load average: 1.22, 1.46, 1.39


sysctl vm.min_free_kbytes

vm.min_free_kbytes = 67584

The real problem is when i start the script the indexing is fast, but when the usage is increasing it is getting slower.

Document wikidoc_18784 added on 2012-05-28 01:03:46 "fast"
wikidoc_18784
-----------------------------------
Document wikidoc_21934 added on 2012-05-28 01:04:00 "slower"
wikidoc_21934
-----------------------------------
Document wikidoc_22903 added on 2012-05-28 01:04:01 "slower"
wikidoc_22903
-----------------------------------
Document wikidoc_20274 added on 2012-05-28 01:04:10 "slower"
wikidoc_20274
-----------------------------------
Document wikidoc_23013 added on 2012-05-28 01:04:53  "even more slower"
wikidoc_23013

The size of the documents is one or two pages of text maximum. The indexing of 10 pages takes about 2-3 seconds.

Tnx everyone for the help :)

Upvotes: 2

Views: 5124

Answers (3)

David Schwartz
David Schwartz

Reputation: 182865

Your issue can't possibly be related to too much memory use. The more memory the system uses, the faster it runs. That's why we add memory to a system to improve its performance. If you think that using less memory will somehow make the system faster, take some memory out. That will force it to use less memory. But, not surprisingly, it will be slower if you do that.

The system keeps memory in use because it takes effort to make memory free. And there is no benefit, since free memory doesn't do anything. It's not like if you use half as much today, you can use twice as much tomorrow. If the system needs memory for something, it can easily just move memory directly from one use to another -- it doesn't need a lot of memory sitting around free.

Modern operating systems only keep a small amount of memory free to cope with certain types of unusual cases where they can't transition memory from one use to another. On Linux, you can find out how much free memory the system needs with this command: sysctl vm.min_free_kbytes. You'll probably find that's roughly how much free memory you have -- and that's good, because that's what the system needs.

So you don't need or want to free memory. You want to figure out why your system is slow.

Update: From your new information, it looks like SE_doc_parse.py is slamming the CPU hard. I would look at optimizing that code, if possible.

Update: Seems it was an inefficient dictionary algorithm being used above the sizes it was intended to scale to and hogging the CPU.

Upvotes: 3

Jakob Bowyer
Jakob Bowyer

Reputation: 34718

From discussion it seems you are storing the data in nothing but a giant huge dict (not often I get to say that with a straight face ;) ) Maybe offsetting the data onto a proper database like redis might reduce the memory usage of python. It might also make your data more efficient and faster to work with.

Upvotes: 3

Eric O. Lebigot
Eric O. Lebigot

Reputation: 94595

I would guess that your program slows down because at least one of the following reasons:

  • Your memory starts swapping, with data going from RAM to disk and vice versa. The solution is indeed that your program use less memory.
  • The algorithm that you use scales badly with the data size. In this case, finding a better algorithm is obviously the solution.

In both cases, we would need to see some of your code (what it essentially amounts to) in order to give a more specific solution.

Common solutions include

  • Using Python's del in order to indicate that a variable is not needed anymore.
  • Using iterators instead of lists (iterators do not use much memory).

Upvotes: 1

Related Questions