Reputation: 1209
This is my code to generate files by home id. Then I will analyze each home seperately.
import pandas as pd
data = pd.read_csv("110homes.csv")
for i in (np.unique(data['dataid'])):
print i
d1 = pd.DataFrame(data[data['dataid']==i])
k = str(i)
d1.to_csv(k + ".csv")
However, I am getting this error. The machine has 200 GB RAM and it is showing memory error too:
data = pd.read_csv("110homes.csv")
File "/usr/lib/python2.7/site-packages/pandas/io/parsers.py", line 474, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib/python2.7/site-packages/pandas/io/parsers.py", line 260, in _read
return parser.read()
File "/usr/lib/python2.7/site-packages/pandas/io/parsers.py", line 721, in read
ret = self._engine.read(nrows)
File "/usr/lib/python2.7/site-packages/pandas/io/parsers.py", line 1170, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 769, in pandas.parser.TextReader.read (pandas/parser.c:7544)
File "pandas/parser.pyx", line 819, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8137)
File "pandas/parser.pyx", line 1833, in pandas.parser._concatenate_chunks (pandas/parser.c:22383)
MemoryError
Upvotes: 1
Views: 669
Reputation: 9263
Data in RAM can take a lot more space than on disk. Without seeing your 110homes.csv
file, it's impossible to know details, but imagine that it consists of 10 floating point numbers per line, like: 0.0,1.0,2.0,...
. In the CSV, each takes 3 bytes + 1 byte for the delimiter. In Python, each takes 8 bytes (on a 64 byte machine) for the float, plus 2 bytes per Unicode char (another 8 bytes), plus 8 bytes for string length, plus 8 bytes per pointer, plus bytes per row, etc.
Think about it like this: On a 64 bit machine, the minimum size for a pointer, a native int, or a native float, is 8 bytes. You need several of those per field, and several more per row. There's nothing unusual about taking 15x in RAM versus disk.
Do a simple test: Take the first 10% of the lines of your file, and monitor python via top
as it processes. See how much RAM it uses. Does it use at least 20 GB?
Upvotes: 1