Reputation: 1217
I have recently started learning more about csvHelper and I need an advice on how to achieve my goal.
I have a CSV file containing some user records (thousands to hundreds of thousands records) and I need to parse the file and validate/process the data. What I need to do is two things:
I need a way to validate whole row while it is being read
one record can also be present multiple times with different date ranges and I need to validate that the ranges don't overlap and if they do, write the WHOLE ORIGINAL LINE to an error file
What I basically can get by with is a way to preserve the whole original row alongside the parsed data, but the way to verify the whole row while the raw data are still available would be better.
Are there some events/actions hidden somewhere I can use to validate row of data after it was created but before it was added to the collection?
If not is there a way to save the whole RAW row into the record so I can verify the row after parsing it AND if it is not valid do what I need with them?
What I've created is the record class like this:
class Record
{ //simplified and omitted fluff for brevity
string Login
string Domain
DateTime? Created
DateTime? Ended
}
and a class map:
class RecordMapping<Record>
{ //simplified and omitted fluff for brevity
public RecordMapping(ConfigurationElement config)
{
//..the set up of the mapping...
}
}
and then use them like this:
public ProcessFile(...)
{
...
using(var reader = StreamReader(...))
using(var csvReader = new CsvReader(reader))
using(var errorWriter = new StreamWriter(...))
{
csvReader.Configuration.RegisterClassMap(new RadekMapping(config));
//...set up of csvReader configuration...
try
{
var records = csvReader.GetRecords<Record>();
}
catch (Exception ex)
{
//..in case of problems...
}
....
}
....
}
Upvotes: 0
Views: 5326
Reputation: 29222
In this scenario the data might be "valid" from CsvHelper's viewpoint, because it can read the data, but invalid for more complex reasons (like an invalid date range.)
In that case, this might be a simple approach:
public IEnumerable<Thing> ReadThings(TextReader textReader)
{
var result = new List<Thing>();
using (var csvReader = new CsvReader(textReader))
{
while (csvReader.Read())
{
var thing = csvReader.GetRecord<Thing>();
if (IsThingValid(thing))
result.Add(thing);
else
LogInvalidThing(thing);
}
}
return result;
}
If what you need to log is the raw text, that would be:
LogInvalidRow(csvReader.Context.RawRecord);
Another option - perhaps a better one - might be to completely separate the validation from the reading. In other words, just read the records with no validation.
var records = csvReaader.GetRecords<Record>();
Your reader class returns them without being responsible for determining which are valid and what to do with them.
Then another class can validate an IEnumerable<Record>
, returning the valid rows and logging the invalid rows.
That way the logic for validation and logging isn't tied up with the code for reading. It will be easier to test and easier to re-use if you get a collection of Record
from something other than a CSV file.
Upvotes: 5