Reputation: 71
I want to convert a CSV file to Parquet format. Is there any way way to do this in C#?
Upvotes: 0
Views: 4358
Reputation: 11
I had memory issues using Cinchoo, loading 5m rows took 20GB+ ram and then crashed.
If you have local spark installed, it can cope with such task with ease, for C# you can use dotnet.spark
, or stick to Scala/PySpark; APIs are identical except casing.
Scala would be the quickest one to implement as it requires less environment setup.
_spark = SparkSession
.Builder()
.AppName(_configuration.ApplicationName)
.GetOrCreate();
_spark
.Read()
// List of options https://spark.apache.org/docs/latest/sql-data-sources-csv.html
.Options(new() { { "Delimiter", ";" } })
.Csv("W:\\folder\\exported.csv")
.Write()
.Parquet("W:\\folder\\converted");
Upvotes: 0
Reputation: 6322
With Cinchoo ETL - an open source library, you can convert CSV file to Parquet easily.
Install the NuGet package
> install-package ChoETL.Parquet
Sample code
using ChoETL;
string csv = @"Id, Name
1, Tom
2, Mark";
using (var r = ChoCSVReader.LoadText(csv)
.WithFirstLineHeader()
.WithMaxScanRows(2)
.QuoteAllFields()
)
{
using (var w = new ChoParquetWriter("*** Your Parquet file ***"))
{
w.Write(r);
}
}
Upvotes: 3