dumbledad
dumbledad

Reputation: 17527

pandas memory error on large RAM machine but not on smaller RAM machine: same code, same data

I run the following on two of my machines:

import os, sqlite3
import pandas as pd
from feat_transform import filter_anevexp
db_path = r'C:\Users\timregan\Desktop\anondb_280718.sqlite3'
db = sqlite3.connect(db_path)
anevexp_df = filter_anevexp(db, 0)

On my laptop (with 8GB of RAM) this runs without issue (although the call out to filter_anevexp takes a few minutes). On my desktop (with 128GB of RAM) it fails in pandas with a memory error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\timregan\source\MentalHealth\code\preprocessing\feat_transform.py", line 171, in filter_anevexp
    anevexp_df = anevexp_df[anevexp_df["user_id"].isin(df)].copy()
  File "C:\Users\timregan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\frame.py", line 2682, in __getitem__
    return self._getitem_array(key)
  File "C:\Users\timregan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\frame.py", line 2724, in _getitem_array
    return self._take(indexer, axis=0)
  File "C:\Users\timregan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\generic.py", line 2789, in _take
    verify=True)
  File "C:\Users\timregan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\internals.py", line 4539, in take
    axis=axis, allow_dups=True)
  File "C:\Users\timregan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\internals.py", line 4425, in reindex_indexer
    for blk in self.blocks]
  File "C:\Users\timregan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\internals.py", line 4425, in <listcomp>
    for blk in self.blocks]
  File "C:\Users\timregan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\internals.py", line 1258, in take_nd
    allow_fill=True, fill_value=fill_value)
  File "C:\Users\timregan\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pandas\core\algorithms.py", line 1655, in take_nd
    out = np.empty(out_shape, dtype=dtype)
MemoryError

Is there anything special I need to do to prevent errors (e.g. addressing errors) on machines with lots of memory?

N.B. I have not included the code in the filter_anevexp function because I am not interested in advice on how to reduce its memory footprint. I am interested in understanding why the same code running on the same data fails with a memory error on a 128GB RAM machine while it succeeds on a 8GB RAM machine?

Upvotes: 2

Views: 307

Answers (1)

eljiwo
eljiwo

Reputation: 846

You are using a 32 bit version in your home pc, this means that your python executables can only access 4gb of ram. Try to reinstall python37 with the 64bits instead of the 32 you are currently using.

Upvotes: 6

Related Questions