Latheesan
Latheesan

Reputation: 24116

Parsing csv files with specific column delimiter & string enclosure using the CsvHelper library

I am using this http://joshclose.github.io/CsvHelper/ awesome library in my .NET C# project for my CSV parsing requirements.

If I have a CSV file that looks like this:

SupplierSku,MappedSageSku
EG1234,EGCD1234
EG4567,EG-XZ567

I usually create a DTO class like this:

public class SkuMapping
{
    public string SupplierSku { get; set; }
    public string MappedSageSku { get; set; }
}

and parse the csv file like this:

// Open & parse selected csv file
var csvReader = new CsvReader(File.OpenText(selectSkuMapping.Text));
var skuMappings = csvReader.GetRecords<SkuMapping>();

// Do something with each row
foreach (SkuMapping skuMapping in skuMappings)
{
    // ...
}

This works great for pre-defined / structured CSV files.

I now have the requirement to parse arbitrary CSV files, where they may contains various column delimiter & string enclosure and exact number of columns on csv is not known, but the column index containing the data I need is known.

Example #1

PartNumb,InStock,PrGroup
"A-X-1230",Y,103
"B-DD-1231",Y,103

Where; column delimiter = , and string enclosure = "

Data I need: Column index 0 (PartNumb) and Column index 1 (InStock)

Example #2

SupplierSku,CatIds,StockStatus,Active
%ADA-BB-124%|4,5,1|%AV%|1
%XAS-E4-S11%|97,41,65|%OS%|0

Where; column delimiter = | and string enclosure = %

Data I need: Column index 0 (SupplierSku) and Column index 2 (StockStatus)


So given the above, what is the best way to use the CsvHelper library to parse arbitrary csv files (where column delimiter, string enclosure and column indexes are known)? I also need the option to skip first row on csv (sometimes csv contains header row and sometimes they dont).

Upvotes: 2

Views: 11932

Answers (5)

CodingYourLife
CodingYourLife

Reputation: 8588

There was a package upgrade and the new syntax is now:

using (var reader = new StreamReader("./myfile.csv", Encoding.UTF8))
using (var csv = new CsvReader(reader, new CsvConfiguration(CultureInfo.InvariantCulture, delimiter: ";", encoding: Encoding.UTF8)))
{
    //do something with csv
}

Update

As pointed out in the comments and seemingly already migrated in my codebase new syntax is

using (var reader = new StreamReader("./myfile.csv", Encoding.UTF8))
using (var csv = new CsvReader(reader, new CsvConfiguration(CultureInfo.InvariantCulture) { Delimiter = ";", Encoding = Encoding.UTF8 }))
{
    //do something with csv
}

Upvotes: 5

Antonio Reyes
Antonio Reyes

Reputation: 536

CsvReader csv = new CsvReader(new StreamReader(stream), true, ';')

Configuration.Delimiter it is not working any more, now you have to pass the delimiter as a param in the CsvReader initialization

Upvotes: 0

Harald Coppoolse
Harald Coppoolse

Reputation: 30464

The answer depends on what you want:

  • Do you know which delimiter(s) are allowed before you start parsing?

If you don't know which delimiters are allowed, you are in trouble: Is 'A'a delimiter? Is 'B' a delimiter?. Let's assume there is a set of characters that you think are suitable as delimiters for the actual CSV stream you are going to parse.

  1. Replace all occurrences of your delimiters with the same one, for example ';'
  2. You can do this using String.Replace(char, char) for each accepted delimiter, or use a regular expression

    • Do you want to select your columns by number of by name?

If you only need certain columns, create a map where you tell the CsvHelper which column must be mapped to which destination.

Example: if you need to map column "MyColumn" to Property YourProperty create a map:

private sealed class MyCsvConverterMap : CsvClassMap<MyDestinationType>
{
    public MyCsvConverterMap()
    {
        Map(item => item.YourProperty).Name("MyColumn");
        // map all properties in your destination to a column
    }
}

using (TextReader txtReader = new StringReader(...))
{
    CsvReader csvReader = new CsvReader(txtReader);
    csvReader.Configuration.Delimiter = ";";
    csvReader.Configuration.HasHeaderRecord = true;
    csvReader.Configuration.RegisterClassMap(new MyCsvConverterMap());

    while (csvReader.Read())
    {
         MyDestinationType convertedRecord = csvReader.GetRecord<MyDestinationType>();
         ...

Addition

It is also possible to map by column number instead of mapping by column name. See an explanation of all kinds of maps: CsvHelper Getting started

Upvotes: 3

rdvanbuuren
rdvanbuuren

Reputation: 696

With a bit more effort you can also create a type-safe mapping with the CsvClassMap and inheriting from the DefaultTypeConverter class to create a converter for the CatIds (comma separated).

Here's an example that works for your example #2:

[TestClass]
public class CsvHelperTest
{
    [TestMethod]
    public void Test()
    {
        var textToParse = "SupplierSku,CatIds,StockStatus,Active" + Environment.NewLine;
        textToParse += "%ADA-BB-124%|4,5,1|%AV%|1" + Environment.NewLine;
        textToParse += "%XAS-E4-S11%|97,41,65|%OS%|0";

        using (var stringReader = new StringReader(textToParse))
        {
            using (var reader = new CsvReader(stringReader))
            {
                reader.Configuration.Quote = '%';
                reader.Configuration.Delimiter = "|";
                reader.Configuration.HasHeaderRecord = true; // If there is no header, set to false.
                reader.Configuration.RegisterClassMap<StockMap>();

                foreach(var stock in reader.GetRecords<Stock>())
                {
                    // normally do something with data, now just test

                    Assert.IsNotNull(stock.SupplierSku);
                    Assert.IsTrue(stock.SupplierSku.IndexOf('%') == -1, "Quotes should be stripped");
                    Assert.IsNotNull(stock.CatIds);
                    Assert.AreEqual(3, stock.CatIds.Length, "Expected 3 CatIds");
                }
            }
        }
    }

    public class StockMap : CsvClassMap<Stock>
    {
        public StockMap()
        {
            Map(stock => stock.SupplierSku).Index(0);
            Map(stock => stock.CatIds).Index(1).TypeConverter<CatIdsConverter>();
            Map(stock => stock.StockStatus).Index(2);
            Map(stock => stock.Active).Index(3); // 1 is true, 0 is false
        }
    }

    public class Stock
    {
        public string SupplierSku { get; set; }
        public int[] CatIds { get; set; }
        public StockStatus StockStatus { get; set; }
        public bool Active { get; set; }
    }

    public enum StockStatus
    {
        AV, OS
    }

    public class CatIdsConverter : DefaultTypeConverter
    {
        public override bool CanConvertFrom(Type type)
        {
            return type == typeof(string);
        }

        public override object ConvertFromString(TypeConverterOptions options, string text)
        {
            if (string.IsNullOrEmpty(text))
                return null;

            var catIds = text.Split(',').Select(c=> Convert.ToInt32(c)).ToArray();
            return catIds;
        }
    }
}

For example #1 just configure Quote = '"', Delimiter = ",", Add another class & CsvClassMap<> implementation and configure that within another CsvReader.

Upvotes: 1

Euan T
Euan T

Reputation: 2141

This seems to work, using CsvHelper:

var textToParse = @"SupplierSku,CatIds,StockStatus,Active
%ADA-BB-124%|4,5,1|%AV%|1
%XAS-E4-S11%|97,41,65|%OS%|0";

string supplierSku;
string stockStatus;

using (var stringReader = new StringReader(textToParse))
{
    using (var reader = new CsvReader(stringReader))
    {
        reader.Configuration.Delimiter = ",";
        reader.Configuration.HasHeaderRecord = true; // If there is no header, set to false.

        while (reader.Read())
        {
            supplierSku = reader.GetField("SupplierSku"); // Or reader.GetField(0)
            stockStatus = reader.GetField("StockStatus"); // Or reader.GetField(2)

            Console.WriteLine($"SKU: {supplierSku}; Status: {stockStatus}");
        }
    }
}

However, it doesn't automatically trim/remove the quote characters - you can easily do that yourself using Trim() or Substring(). There is a little more manual effort involved, but it is still easier than doing it manually.

Upvotes: 4

Related Questions