UserControl
UserControl

Reputation: 15159

How to read a header from a specific line with CsvHelper?

I'm trying to read a CSV file where header is at row 3:

some crap line
some empty line
COL1,COL2,COl3,...
val1,val2,val3
val1,val2,val3

How do I tell CSVHelper the header is not at the first row?

I tried to skip 2 lines with Read() but the succeeding call to ReadHeader() throws an exception that the header has already been read.

using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration)) {
   csv.Read();
   csv.Read();
   csv.ReadHeader();
   .....

If I set csvConfiguration.HasHeaderRecord to false ReadHeader() fails again.

Upvotes: 8

Views: 10475

Answers (3)

dbc
dbc

Reputation: 116595

As of CsvHelper 27.0 the problem is no longer reproducible. A header can now be read in from any line. This may have been implemented as far back as Release 3.0.0 from 2017 which included, according to the change log:

3.0.0

Read more than 1 header row.

Thus the following code now just works, and has worked for a while:

var csvText = "some crap line\nsome empty line\nCOL1,COL2,COl3\nval1,val2,val3\nval1,val2,val3\n\n";
using var stream = new MemoryStream(Encoding.UTF8.GetBytes(csvText));

var csvConfiguration = new CsvConfiguration(CultureInfo.InvariantCulture)
{
    // Your settings here.
};
using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration))
{
    csv.Read(); // Read in the first row "some crap line"
    csv.Read(); // Read in the second row "some empty line"
    csv.Read(); // Read in the third row which is the actual header.
    csv.ReadHeader(); // Process the currently read row as the header.

    Assert.AreEqual(3, csv.HeaderRecord.Length);
    Assert.AreEqual(@"COL1,COL2,COl3", String.Join(",", csv.HeaderRecord));

Successful demo fiddle #1 here.

Warning: be aware that CsvHelper skips blank lines by default so if some of the preliminary lines to be skipped might or might not be blank, then csv.Read() might silently read past them -- and then consume your header also, resulting the the wrong row being used as a header row!

Failing demo fiddle #2 here.

To avoid this possibility and deterministically skip a certain number of lines at the beginning of the file, you must set CsvConfiguration.IgnoreBlankLines = false. However, this property cannot be modified once the CsvReader is created, so if you need to skip blank data lines this can be accomplished by using a ShouldSkipRecord callback:

bool ignoreBlankLines = false;
var csvConfiguration = new CsvConfiguration(CultureInfo.InvariantCulture)
{
    IgnoreBlankLines = false,
    ShouldSkipRecord = (args) => !ignoreBlankLines ? false : args.Record.Length == 0 || args.Record.Length == 1 && string.IsNullOrEmpty(args.Record[0]),
    // Your settings here.
};
using (var csv = new CsvReader(new StreamReader(stream), csvConfiguration))
{
    csv.Read(); // Read in the first row "some crap line"
    csv.Read(); // Read in the second empty row, which is empty.
    csv.Read(); // Read in the third row which is the actual header.
    csv.ReadHeader(); // Process the currently read row as the header.
    ignoreBlankLines = true; // Now that the header has been read, ignore blank data lines.

Successful demo fiddle #3 here.

Upvotes: 2

Bruce Dunwiddie
Bruce Dunwiddie

Reputation: 2908

This isn't better than Evk's answer, but I was interested.

The CsvConfiguration class appears to have a Func callback named ShouldSkipRecord that can be wired to to implement custom logic.

https://github.com/JoshClose/CsvHelper/tree/master/src/CsvHelper

CsvConfiguration.cs

/// <summary>
/// Gets or sets the callback that will be called to
/// determine whether to skip the given record or not.
/// This overrides the <see cref="SkipEmptyRecords"/> setting.
/// </summary>
public virtual Func<string[], bool> ShouldSkipRecord { get; set; }

CsvReader.cs

/// <summary>
/// Advances the reader to the next record.
/// If HasHeaderRecord is true (true by default), the first record of
/// the CSV file will be automatically read in as the header record
/// and the second record will be returned.
/// </summary>
/// <returns>True if there are more records, otherwise false.</returns>
public virtual bool Read()
{
    if (doneReading)
    {
        throw new CsvReaderException(DoneReadingExceptionMessage);
    }

    if (configuration.HasHeaderRecord && headerRecord == null)
    {
        ReadHeader();
    }

    do
    {
        currentRecord = parser.Read();
    }
    while (ShouldSkipRecord());

    currentIndex = -1;
    hasBeenRead = true;

    if (currentRecord == null)
    {
        doneReading = true;
    }

    return currentRecord != null;
}

/// <summary>
/// Checks if the current record should be skipped or not.
/// </summary>
/// <returns><c>true</c> if the current record should be skipped, <c>false</c> otherwise.</returns>
protected virtual bool ShouldSkipRecord()
{
    if (currentRecord == null)
    {
        return false;
    }

    return configuration.ShouldSkipRecord != null
        ? configuration.ShouldSkipRecord(currentRecord)
        : configuration.SkipEmptyRecords && IsRecordEmpty(false);
}

Unfortunately, it appears that you'd have to set HasHeaderRecord to false, then set it back to true, prior to calling ReadHeaders or calling Read on the third line, because the ShouldSkipRecord logic in Read() is after the ReadHeader() logic.

Upvotes: 1

Evk
Evk

Reputation: 101453

Try this:

using (var reader = new StreamReader(stream)) {
      reader.ReadLine();
      reader.ReadLine();
      using (var csv = new CsvReader(reader)) {                    
          csv.ReadHeader();                    
    }
}

Upvotes: 9

Related Questions