Reputation: 2046
I'm trying to use MessagePack
to save multiple lists of structs because I read that its performance is better than BinaryFormatter
serialization.
What I want to do is to receive real-time time series data and to regularly save(append) it to disk time to time, for example, if the number of elements of a list is 100. My questions are:
1) Is it better to serialize lists of structs and save it to disk asynchronously in this scenario?
2) How to simply save it to disk with MessagePack?
public struct struct_realTime
{
public int indexNum { get; set; }
public string currentTime { get; set; }
public string currentType { get; set; }
}
class Program
{
static void Main(string[] args)
{
List<struct_realTime> list_temp = new List<struct_realTime>(100000);
for (int num=0; num < 100000; num++)
{
list_temp.Add(new struct_realTime
{
indexNum = 1,
currentTime = "time",
currentType = "type",
});
}
string filename = "file.bin";
using (var fileStream = new FileStream(filename, FileMode.Append, FileAccess.Write))
{
byte[] bytes = MessagePackSerializer.Serialize(list_temp);
Console.WriteLine(MessagePackSerializer.ToJson(bytes));
}
}
}
When I run this code, it creates file.bin
and prints out 100000 structs, but the file is 0 byte.
When I use BinaryFormatter
, I do this:
using (var fileStream = new FileStream("file.bin", FileMode.Append))
{
BinaryFormatter formatter = new BinaryFormatter();
formatter.Serialize(fileStream, list_temp);
}
How can I fix the problem?
Upvotes: 5
Views: 4713
Reputation: 116795
What you are trying to do is to append an object (here List<struct_realTime>
) serialized using MessagePackSerializer
to a file containing an already-serialized sequence of similar objects, in the same way it is possible with BinaryFormatter
, protobuf-net or Json.NET. Later, you presumably want to be able to deserialize the entire sequence into a list or array of objects of the same type.
Your code has three problems, two simple and one fundamental.
The simple problems are as follows:
You don't actually write to the fileStream
. Instead, do the following:
// Append each list_temp sequentially
using (var fileStream = new FileStream(filename, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
MessagePackSerializer.Serialize(fileStream, list_temp);
}
You haven't marked struct_realTime
with [MessagePackObject]
attributes. This can be implemented e.g. as follows:
[MessagePackObject]
public struct struct_realTime
{
[Key(0)]
public int indexNum { get; set; }
[Key(1)]
public string currentTime { get; set; }
[Key(2)]
public string currentType { get; set; }
}
Having done that, you can now repeatedly serialize list_temp
to a file... but you will not be able to read them afterwards! That's because MessagePackSerializer
seems to read the entire file when deserializing the root object, skipping over any additional data appended in the file. Thus code like the following will fail, because only one object gets read from the file:
List<List<struct_realTime>> allItemsInFile = new List<List<struct_realTime>>();
using (var fileStream = File.OpenRead(filename))
{
while (fileStream.Position < fileStream.Length)
{
allItemsInFile.Add(MessagePackSerializer.Deserialize<List<struct_realTime>>(fileStream));
}
}
Assert.IsTrue(allItemsInFile.Count == expectedNumberOfRootItemsInFile);
Demo fiddle #1 here.
And code like the following will fail because the (first) root object in the stream is not an array of arrays of objects, but rather just a single array:
List<List<struct_realTime>> allItemsInFile;
using (var fileStream = File.OpenRead(filename))
{
allItemsInFile = MessagePackSerializer.Deserialize<List<List<struct_realTime>>>(fileStream);
}
Assert.IsTrue(allItemsInFile.Count == expectedNumberOfRootItemsInFile);
Demo fiddle #2 here.
As MessagePackSerializer
seems to lack the ability to deserialize multiple root objects from a stream, what are your options? Firstly, you could deserialize a List<List<struct_realTime>>
, append to it, and then serialize the entire thing back to the file. Presumably you don't want to do that for performance reasons.
Secondly, using the MessagePack specification directly, you could manually seek to the beginning of the file to parse and rewrite an appropriate array 32
format header, then seek to the end of the file and use MessagePackSerializer
to serialize and append the new item. The following extension method does the job:
public static class MessagePackExtensions
{
const byte Array32 = 0xdd;
const int Array32HeaderLength = 5;
public static void AppendToFile<T>(Stream stream, T item)
{
if (stream == null)
throw new ArgumentNullException(nameof(stream));
if (!stream.CanSeek)
throw new ArgumentException("!stream.CanSeek");
stream.Position = 0;
var buffer = new byte[Array32HeaderLength];
var read = stream.Read(buffer, 0, Array32HeaderLength);
stream.Position = 0;
if (read == 0)
{
FormatArray32Header(buffer, 1);
stream.Write(buffer, 0, Array32HeaderLength);
}
else
{
var count = ParseArray32Header(buffer, read);
FormatArray32Header(buffer, count + 1);
stream.Write(buffer, 0, Array32HeaderLength);
}
stream.Position = stream.Length;
MessagePackSerializer.Serialize(stream, item);
}
static void FormatArray32Header(byte [] buffer, uint value)
{
buffer[0] = Array32;
buffer[1] = unchecked((byte)(value >> 24));
buffer[2] = unchecked((byte)(value >> 16));
buffer[3] = unchecked((byte)(value >> 8));
buffer[4] = unchecked((byte)value);
}
static uint ParseArray32Header(byte [] buffer, int readCount)
{
if (readCount < 5 || buffer[0] != Array32)
throw new ArgumentException("Stream was not positioned on an Array32 header.");
int i = 1;
//https://stackoverflow.com/questions/8241060/how-to-get-little-endian-data-from-big-endian-in-c-sharp-using-bitconverter-toin
//https://stackoverflow.com/a/8241127 by https://stackoverflow.com/users/23354/marc-gravell
var value = unchecked((uint)((buffer[i++] << 24) | (buffer[i++] << 16) | (buffer[i++] << 8) | buffer[i++]));
return value;
}
}
It can be used to append your list_temp
as follows:
// Append each entry sequentially
using (var fileStream = new FileStream(filename, FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
MessagePackExtensions.AppendToFile(fileStream, list_temp);
}
And then later, to deserialize the entire file, do:
List<List<struct_realTime>> allItemsInFile;
using (var fileStream = File.OpenRead(filename))
{
allItemsInFile = MessagePackSerializer.Deserialize<List<List<struct_realTime>>>(fileStream);
}
Notes:
The MessagePack protocol has 3 different array formats:
fixarray
stores an array whose length is up to 15 elements.array 16
stores an array whose length is up to (2^16)-1 elements.array 32
stores an array whose length is up to (2^32)-1 elements.
The extension method requires that the root array be array 32
to eliminate the need to reformat the entire array when the new size becomes larger than the capacity of fixarray
or array 16
. MessagePackSerializer
, however, will always write to the most compact format, so appending to a collection previously serialized by MessagePackSerializer
isn't guaranteed to work.
If you want to use a fast binary serializer that doesn't require an array count or size at the beginning of the file, thereby supporting append operations out of the box, consider protobuf-net. For details see I have a Single File And need to serialize multiple objects randomly. How can I in c#? and How to append object to a file while serializing using c# protobuf-net?.
For a general overview of how to use this serializer see https://github.com/protobuf-net/protobuf-net#protobuf-net and Protobuf-net: the unofficial manual. You will need to mark your types with attributes similar to those of MessagePackSerializer
.
Demo fiddle #3 here.
Upvotes: 9