ooolllooo
ooolllooo

Reputation: 351

Why I cant write dataframe in DB?

I have 32 gb RAM and I use jupyter and pandas. My dataframe isn't very big, but when I want to write it in Arctic data base I have "MemoryError":

df_q.shape
(157293660, 10)
def memory(df):
    mem = df.memory_usage(index=True).sum() / (1024 ** 3)
    print(mem)
memory(df_q)
12.8912200034

And I want to write it:

from arctic import Arctic
import arctic as arc
store = Arctic('.....')
lib = store['myLib']
lib.write('quotes', df_q)

MemoryError Traceback (most recent call last) in () 1 memory(df_q) ----> 2 lib.write('quotes', df_q)

/usr/local/lib/python2.7/dist-packages/arctic/decorators.pyc in f_retry(*args, **kwargs) 48 while True: 49 try: ---> 50 return f(*args, **kwargs) 51 except (DuplicateKeyError, ServerSelectionTimeoutError) as e: 52 # Re-raise errors that won't go away.

/usr/local/lib/python2.7/dist-packages/arctic/store/version_store.pyc in write(self, symbol, data, metadata, prune_previous_version, **kwargs) 561 562 handler = self._write_handler(version, symbol, data, **kwargs) --> 563 mongo_retry(handler.write)(self._arctic_lib, version, symbol, data, previous_version, **kwargs) 564 565 # Insert the new version into the version DB

/usr/local/lib/python2.7/dist-packages/arctic/decorators.pyc in f_retry(*args, **kwargs) 48 while True: 49 try: ---> 50 return f(*args, **kwargs) 51 except (DuplicateKeyError, ServerSelectionTimeoutError) as e: 52 # Re-raise errors that won't go away.

/usr/local/lib/python2.7/dist-packages/arctic/store/_pandas_ndarray_store.pyc in write(self, arctic_lib, version, symbol, item, previous_version) 301 def write(self, arctic_lib, version, symbol, item, previous_version): 302 item, md = self.to_records(item) --> 303 super(PandasDataFrameStore, self).write(arctic_lib, version, symbol, item, previous_version, dtype=md) 304 305 def append(self, arctic_lib, version, symbol, item, previous_version):

/usr/local/lib/python2.7/dist-packages/arctic/store/_ndarray_store.pyc in write(self, arctic_lib, version, symbol, item, previous_version, dtype) 385 version['type'] = self.TYPE 386 version['up_to'] = len(item) --> 387 version['sha'] = self.checksum(item) 388 389 if previous_version:

/usr/local/lib/python2.7/dist-packages/arctic/store/_ndarray_store.pyc in checksum(self, item) 370 def checksum(self, item): 371 sha = hashlib.sha1() --> 372 sha.update(item.tostring()) 373 return Binary(sha.digest()) 374

MemoryError:

WTF ? If I use df_q.to_csv() I will wait for years....

Upvotes: 1

Views: 145

Answers (1)

user6530460
user6530460

Reputation:

Your issue actually is not a memory issue. If you read your errors, it seems that your library is having trouble accessing your data...

1st Error: Says your server has timed out. (ServerSelectionTimeoutError)

2nd Error: Trying to update MongoDB version.

3rd Error: Retries accessing your server, fails.(ServerSelectionTimeoutError)

etc. So essentially your problem lies in the Arctic package itself (see last error is a checksum error). You can also deduce this from the fact that df_q.to_csv() works, however it is very slow since it is not optimized like Artic. I would suggest trying to reinstall the Arctic package

Upvotes: 0

Related Questions