Reputation: 4281
I use Python's dict type to store a data file with more than 550k keys, almost 29M. however, after reading the data file, the memory used is more than 70M which is unnormal.
So, how does this happen?
Below is the function to read the data file.
def _update_internal_metrics(self, signum, _):
"""Read the dumped metrics file"""
logger.relayindex('reload dumped file begins')
dumped_metrics_file_path = os.path.join(settings.DATA_DIR,
settings.DUMPED_METRICS_FILE)
epoch = int(time.time())
try:
new_metrics = {}
with open(dumped_metrics_file_path) as dumped_metrics_file:
for line in dumped_metrics_file:
line = line.strip()
new_metrics[line] = epoch
except Exception:
if not signum:
self._reload_dumped_file()
logger.relayindex("Dumped metrics file does not exist or can"
"not be read. No update")
else:
settings["metrics"] = new_metrics
instrumentation.increment('dumped.Reload')
logger.relayindex('reload dumped file ends')
Upvotes: 2
Views: 76
Reputation: 96258
First of all, top
isn't the right way to check it, as it will tell you the memory consumption of the whole process. You can use getsizeof
from the sys
module:
sys.getsizeof(new_metrics)
Second, there are some overhead associated both with strings and hash tables:
sys.getsizeof('')
On my system this is 24
bytes overhead, and the overhead is consistent regardless of the string size. With 550k keys that's about 13M overhead.
Python tries to keep the hash tables not too dense as that would kill the lookup time. AFAIK the cpython
implementation uses a 2x growth factor, with 2^k table sizes. As your key size is just above a factor of two (math.log(550000,2) # 19.06
), it's relatively sparse with 2 ** 20 = 1048576
slots. On your 64 bit system with 8 byte object pointers per string that's an additional 8M overhead. You also store integers, which weren't in the original file (another 8M), and each hash table slot also contains the stored hash value (another 8M). See the source of PyDictEntry.
That's 66M total, and of course you need some space for the rest of your python app. It all looks fine to me.
Upvotes: 1