geekchic
geekchic

Reputation: 1566

Protobuf Exception When Deserializing Large File

I'm using protobuf to serialize large objects to binary files to be deserialized and used again at a later date. However, I'm having issues when I'm deserializing some of the larger files. The files are roughly ~2.3 GB in size and when I try to deserialize them I get several exceptions thrown (in the following order):

I've looked at the question referenced in the second exception, but that doesn't seem to cover the problem I'm having.

I'm using Microsoft's HPC pack to generate these files (they take a while) so the serialization looks like this:

   using (var consoleStream = Console.OpenStandardOutput())
   {
            Serializer.Serialize(consoleStream, dto);
   }

And I'm reading the files in as follows:

    private static T Deserialize<T>(string file)
    {
        using (var fs = File.OpenRead(file))
        {
            return Serializer.Deserialize<T>(fs);
        }
    }

The files are two different types. One is about 1GB in size, the other about 2.3GB. The smaller files all work, the larger files do not. Any ideas what could be going wrong here? I realise I've not given a lot of detail, can give more as requested.

Upvotes: 2

Views: 2262

Answers (1)

Marc Gravell
Marc Gravell

Reputation: 1063338

Here I need to refer to a recent discussion on the protobuf list:

Protobuf uses int to represent sizes so the largest size it can possibly support is <2G. We don't have any plan to change int to size_t in the code. Users should avoid using overly large messages.

I'm guessing that the cause of the failure inside protobuf-net is basically the same. I can probably change protobuf-net to support larger files, but I have to advise that this is not recommended, because it looks like no other implementation is going to work well with such huge data.

The fix is probably just a case of changing a lot of int to long in the reader/writer layer. But: what is the layout of your data? If there is an outer object that is basically a list of the actual objects, there is probably a sneaky way of doing this using an incremental reader (basically, spoofing the repeated support directly).

Upvotes: 1

Related Questions