Reputation: 2839
Does anyone know how row based read access to a parquet file using ParquetSharp
is performed? This is where I have got to but the inputStream
throws an cannot convert to string error.
using (var buffer = new ResizableBuffer())
{
using (var reader = new ParquetFileReader(@"C:\Users\X\Documents\X.parquet"))
{
using (var inputStream = new BufferReader(buffer))
{
using (var readerRow = ParquetFile.CreateRowReader<Tuple>(inputStream))
{
}
}
}
}
Also ParquetSharp
uses TTuple
but I cannot find any definition for it anywhere.
I know parquet is column based so this is not the most efficient method to read but it is convenient for my work.
Regards
Upvotes: 0
Views: 1900
Reputation: 298
The row-oriented API of ParquetSharp uses reflection to discover the public fields of the given row structure or class. TTuple is just a generic parameter, a placeholder for the row type.
It works with custom structures or classes, System.Tuple and System.ValueTuple. You can see a few examples in https://github.com/G-Research/ParquetSharp/blob/master/csharp.test/TestRowOrientedParquetFile.cs
To take your example, you would define your expected row type:
internal struct MyStruct
{
public readonly int FirstField;
public readonly string SecondField;
}
And then somewhere in your method:
using (var reader = ParquetFile.CreateRowReader<MyStruct>(@"C:\Users\X\Documents\X.parquet"))
{
/* read rows */
}
Although I personally prefer using C# 7 tuples, saving you the trouble to have to give your own struct definition in the first place. The only downside is when writing a Parquet file, ParquetSharp cannot automatically infer the column names from the field names (internally both System.Tuple and System.ValueTuple have got boring field names such as Item1, Item2, etc).
using (var reader = ParquetFile.CreateRowReader<(int firstField, string secondField)>(@"C:\Users\X\Documents\X.parquet"))
{
/* read rows */
}
Upvotes: 1