Dhara
Dhara

Reputation: 6767

Unpickling large objects stored on network drives

I have large (~75MB) pickled objects that are made available on mapped network drives (eg: X:/folder1/large_pickled_item.pk) The objects contain numpy arrays+python lists, and are pickled using cPickle, protocol 2

When I try to unpickle the data, I get the following error messages:

Using pickle: KeyError: (random character)

Using cPickle: IOError: [Errno 22] Invalid argument

I do not get errors if the pickled objects are smaller in size, or if I copy the (larger) objects to a local drive and run the same script.

Any idea where the problem lies? Is it a python+pickle problem or a windows shares issue?

Notes:

  1. I am using Python 2.7.2 on Windows XP Professional (SP3)
  2. I do not have control over the object format, I do not create them, I can only read them
  3. Example stack Trace:

    File "test.py", line 38, in getObject obj = pickle.load(input) File "C:\software\python\lib\pickle.py", line 1378, in load return Unpickler(file).load() File "C:\software\python\lib\pickle.py", line 858, in load dispatchkey KeyError: '~'

Solution

  1. Read the file in chunks of 67076095 bytes into a string buffer.
  2. Call pickle.loads with the string buffer instead of pickle.load with the file object

Upvotes: 1

Views: 728

Answers (1)

NPE
NPE

Reputation: 500437

This is due to a Windows bug, whereby reading and writing network files in chunks larger than 64MB does not work.

I suggest trying the mirror image of the workaround presented in https://stackoverflow.com/a/4228291/367273

If that doesn't help, perhaps you could create a wrapper for the file object that would automatically split every large read() into multiple smaller reads, and present that wrapper to the pickle module?

Upvotes: 1

Related Questions