Reputation: 21
public async Task UploadParquetFromObjects<T>(string fileName, T objects)
{
var stringJson = JArray.FromObject(objects).ToString();
var parsedJson = ChoJSONReader.LoadText(stringJson);
var desBlob = blobClient.GetBlockBlobClient(fileName);
using (var outStream = await desBlob.OpenWriteAsync(true).ConfigureAwait(false))
using (ChoParquetWriter parser = new ChoParquetWriter(outStream))
{
parser.Write(parsedJson);
}
}
I'm using this code to send some data to a file on an Azure Blob Storage. At first, it worked fine, it created the file, put some information on it and it was readable, but with some investigation, it only write a fraction of the data I send. For example, I send a list of 15 items and it only writes 3. I tried different datasets, with different sizes and composed of different objects, the writer varies on the number of registers written, but it never gets to 100%.
Am I doing something wrong?
Upvotes: 1
Views: 320
Reputation: 6332
This issue is being tracked and addressed in GitHub issues section.
https://github.com/Cinchoo/ChoETL/issues/230
The issue was the input JSON has inconsistent members, hence missing datetime members are set as null by JSON reader. Parquet writer couldn't handle such null datetime values. Applied fix.
Sample fiddle: https://dotnetfiddle.net/PwxNWX
Packages used:
ChoETL.JSON.Core v1.2.1.49 (beta2)
ChoETL.Parquet v1.0.1.23 (beta6)
Upvotes: 0