Reputation: 3848
I have an ASCII file that is essentially a grid of 16-bit signed integers; the file size on disk is approximately 300MB. I do not need to read the file into memory, but do need to store its contents as a single container (of containers), so for initial testing on memory use I tried list
and tuples
as inner containers with the outer container always as a list
via list comprehension:
with open(file, 'r') as f:
for _ in range(6):
t = next(f) # skipping some header lines
# Method 1
grid = [line.strip().split() for line in f] # produces a 3.3GB container
# Method 2 (on another run)
grid = [tuple(line.strip().split()) for line in f] # produces a 3.7GB container
After discussing use of the grid amongst the team, I need to keep it as a list of lists up until a certain point at which time I will then convert it to a list of tuples for program execution.
What I am curious about is how a 300MB file can have its lines stored in a container of containers and have its overall size be 10x the original raw file size. Does each container really occupy that much memory space for holding a single line each?
Upvotes: 4
Views: 155
Reputation: 21991
If you are concerned about storing data in memory and do not want to use tools outside of the standard library, you might want to take a look at the array
module. It is designed to store numbers very efficiently in memory, and the array.array
class accept various type codes based on the characteristics of the numbers you want stored. The following is a simple demonstration of how you might want to adapt the module for your use:
#! /usr/bin/env python3
import array
import io
import pprint
import sys
CONTENT = '''\
Header 1
Header 2
Header 3
Header 4
Header 5
Header 6
0 1 2 3 4 -5 -6 -7 -8 -9
-9 -8 -7 -6 -5 4 3 2 1 0 '''
def main():
with io.StringIO(CONTENT) as file:
for _ in range(6):
next(file)
grid = tuple(array.array('h', map(int, line.split())) for line in file)
print('Grid takes up', get_size_of_grid(grid), 'bytes of memory.')
pprint.pprint(grid)
def get_size_of_grid(grid):
return sys.getsizeof(grid) + sum(map(sys.getsizeof, grid))
if __name__ == '__main__':
main()
Upvotes: 1