jamie2015
jamie2015

Reputation: 291

CsvHelper : How to detect the Delimiter from the given csv file

I am using CsvHelper to read/writer the data into Csv file. Now I want to parse the delimiter of the csv file. How can I get this please?

My code:

     var parser = new CsvParser(txtReader);
     delimiter = parser.Configuration.Delimiter;

I always got delimiter is "," but actually in the csv file the delimiter is "\t".

Upvotes: 9

Views: 15756

Answers (4)

Vincent
Vincent

Reputation: 941

There is (at least now) a DetectDelimiter value that is set to false. You can then add wish delimiters you wish to test, although the default is sensible

Upvotes: 4

Steven
Steven

Reputation: 1494

Since I had to deal with the possibility that, depending on the localization settings of the user, the CSV file (Saved in MS Excel) could contain a different delimiter, I ended up with the following approach :

public static string DetectDelimiter(StreamReader reader)
{
    // assume one of following delimiters
    var possibleDelimiters =  new List<string> {",",";","\t","|"};

    var headerLine = reader.ReadLine();

    // reset the reader to initial position for outside reuse
    // Eg. Csv helper won't find header line, because it has been read in the Reader
    reader.BaseStream.Position = 0;
    reader.DiscardBufferedData();

    foreach (var possibleDelimiter in possibleDelimiters)
    {
        if (headerLine.Contains(possibleDelimiter))
        {
            return possibleDelimiter;
        }
    }

    return possibleDelimiters[0];
}

I also needed to reset the reader's read position, since it was the same instance I used In the CsvReader constructor.

The usage was then as follows:

using (var textReader = new StreamReader(memoryStream))
{
    var delimiter = DetectDelimiter(textReader);

    using (var csv = new CsvReader(textReader))
    {
        csv.Configuration.Delimiter = delimiter;

        ... rest of the csv reader process

    }
}

Upvotes: 10

Josh Close
Josh Close

Reputation: 23383

CSV is Comma Separated Values. I don't think you can reliably detect if there is a different character used a separator. If there is a header row, then you might be able to count on it.

You should know the separator that is used. You should be able to see it when opening the file. If the source of the files gives you a different separator each time and is not reliable, then I'm sorry. ;)

If you just want to parse using a different delimiter, then you can set csv.Configuration.Delimiter. http://joshclose.github.io/CsvHelper/#configuration-delimiter

Upvotes: 3

Maraboc
Maraboc

Reputation: 11083

I found this piece of code in this site

public static char Detect(TextReader reader, int rowCount, IList<char> separators)
{
    IList<int> separatorsCount = new int[separators.Count];

    int character;

    int row = 0;

    bool quoted = false;
    bool firstChar = true;

    while (row < rowCount)
    {
        character = reader.Read();

        switch (character)
        {
            case '"':
                if (quoted)
                {
                    if (reader.Peek() != '"') // Value is quoted and 
            // current character is " and next character is not ".
                        quoted = false;
                    else
                        reader.Read(); // Value is quoted and current and 
                // next characters are "" - read (skip) peeked qoute.
                }
                else
                {
                    if (firstChar)  // Set value as quoted only if this quote is the 
                // first char in the value.
                        quoted = true;
                }
                break;
            case '\n':
                if (!quoted)
                {
                    ++row;
                    firstChar = true;
                    continue;
                }
                break;
            case -1:
                row = rowCount;
                break;
            default:
                if (!quoted)
                {
                    int index = separators.IndexOf((char)character);
                    if (index != -1)
                    {
                        ++separatorsCount[index];
                        firstChar = true;
                        continue;
                    }
                }
                break;
        }

        if (firstChar)
            firstChar = false;
    }

    int maxCount = separatorsCount.Max();

    return maxCount == 0 ? '\0' : separators[separatorsCount.IndexOf(maxCount)];
}

With separators is the possible separators that you can have.

Hope that help :)

Upvotes: 4

Related Questions