Mei Zhang
Mei Zhang

Reputation: 1714

clean way to unpickle data saved with several pickler.dump calls

To unpickle data that was saved using several pickler.dump calls, we need the same number of calls to unpickler.load. One quick and dirty way to do it would be with a try except block like this:

with open("data.pk", "wb") as f:
    pickler = pickle.Pickler(f)
    pickler.dump("message1")
    pickler.dump("message2")
    pickler.dump("message3")

with open("data.pk", "rb") as f:
    unpickler = pickle.Unpickler(f)
    while True:
        try:
            loaded_data = unpickler.load()
        except Exception:
            break
        print("loaded:", loaded_data)

However, relying on exception handling for this looks like a cheap hack to me. Is that a good approach or is there a better way? If I wanted to know the number of calls needed in advance, should I explicitly save it at the beginning of the file?

Upvotes: 1

Views: 59

Answers (2)

martineau
martineau

Reputation: 123473

As @Martijn Pieters said, there's really nothing wrong with your current approach. If you want to make the code that actually gets the data more readable (i.e. more concise and "cleaner"), you could create a generator function to hide the ugly details:

import pickle

def unpickled_items(filename):
    """ Unpickle a file of pickled data. """
    with open(filename, "rb") as f:
        while True:
            try:
                yield pickle.load(f)
            except EOFError:
                break

# Create test file.
with open("data.pk", "wb") as f:
    pickler = pickle.Pickler(f)
    pickler.dump("message1")
    pickler.dump("message2")
    pickler.dump("message3")


loaded_data = [item for item in unpickled_items("data.pk")]
print("saved data:", loaded_data)

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1122322

That's a fine approach; you are simply reading all the pickles that the file has stored.

The alternative would be to first write the object count to the file:

with open("data.pk", "wb") as f:
    f.write((3).to_bytes(2, 'big'))  # 2 bytes gives you enough room for expansion
    pickler = pickle.Pickler(f)
    pickler.dump("message1")
    pickler.dump("message2")
    pickler.dump("message3")

with open("data.pk", "rb") as f:
    count = int.from_bytes(f.read(2), 'big')
    unpickler = pickle.Unpickler(f)
    for i in range(count):
        loaded_data = unpickler.load()
        print("loaded:", loaded_data)

Upvotes: 1

Related Questions