itsdhandapani
itsdhandapani

Reputation: 71

How to convert a CSV file to Parquet using C#

I want to convert a CSV file to Parquet format. Is there any way way to do this in C#?

Upvotes: 0

Views: 4358

Answers (2)

grazy27
grazy27

Reputation: 11

I had memory issues using Cinchoo, loading 5m rows took 20GB+ ram and then crashed.

If you have local spark installed, it can cope with such task with ease, for C# you can use dotnet.spark, or stick to Scala/PySpark; APIs are identical except casing.

Scala would be the quickest one to implement as it requires less environment setup.

_spark = SparkSession
             .Builder()
             .AppName(_configuration.ApplicationName)
             .GetOrCreate();

_spark
    .Read()
    // List of options https://spark.apache.org/docs/latest/sql-data-sources-csv.html
    .Options(new() { { "Delimiter", ";" } }) 
    .Csv("W:\\folder\\exported.csv")
    .Write()
    .Parquet("W:\\folder\\converted");

Upvotes: 0

Cinchoo
Cinchoo

Reputation: 6322

With Cinchoo ETL - an open source library, you can convert CSV file to Parquet easily.

Install the NuGet package

> install-package ChoETL.Parquet

Sample code

using ChoETL;

string csv = @"Id, Name
1, Tom
2, Mark";

using (var r = ChoCSVReader.LoadText(csv)
    .WithFirstLineHeader()
    .WithMaxScanRows(2)
    .QuoteAllFields()
    )
{
    using (var w = new ChoParquetWriter("*** Your Parquet file ***"))
    {
        w.Write(r);
    }
}

Upvotes: 3

Related Questions