Reputation: 185
I have a parquet file and I am trying to convert it to a CSV file, it seems as though most recommend using Spark, however I need to use C# to accomplish this task, specifically I need to use .NET Core 3.0.
Its tricky because parquet is columnar data which is making it annoying to convert to CSV...
I have tried loading it into a datatable but I dont like that solution because I need the entire file in memory and I am losing certain records somehow.
I am using parquet.net but I am open to any other parquet library that works on .net core/standard
Thank you in advance.
Upvotes: 1
Views: 8026
Reputation: 6322
With Cinchoo ETL - an open source library, you can convert Parquet file to CSV easily.
Install Nuget package
install-package ChoETL.Parquet
Sample code
using ChoETL;
StringBuilder csv = new StringBuilder();
using (var r = new ChoParquetReader(@"*** Your Parquet file ***")
.ParquetOptions(o => o.TreatByteArrayAsString = true)
)
{
using (var w = new ChoCSVWriter(csv)
.WithFirstLineHeader()
.UseNestedKeyFormat(false)
)
w.Write(r);
}
Console.WriteLine(csv.ToString());
For more information, please visit codeproject article.
Upvotes: 3
Reputation: 17091
I haven't given it a shot, but I wonder whether you could leverage / abuse the Microsoft Spark SQL libraries to your benefit.
There's
DataFrameReader.Parquet(String[])
And also:
DataFrameWriter.Csv(String) Method
I wonder whether you could use a DataFrame as an in memory intermediary.
It's just a guess at the moment as your question intrigued me, perhaps I'll give it a shot once I've got some sleep. :-)
Upvotes: 1