Reputation: 140
I have sample data that looks like this:
1 This is a random line in the file
2
3 SOURCE_ID|NAME|START_DATE|END_DATE|VALUE_1|VALUE_2
4
5 Another random line in the file
6
7
8
9
10 GILBER|FRED|2019-JAN-01|2019-JAN-31|ABC|DEF
11 ALEF|ABC|2019-FEB-01|2019-AUG-31|FBC|DGF
12 GILBER|FRED|2019-JAN-01|2019-JAN-31|ABC|TEF
13 FLBER|RED|2019-JUN-01|2019-JUL-31|AJC|DEH
14 GI|JOE|2020-APR-01|2020-DEC-31|GBC|DER
I am unable to save changes to the file. Ie, I can't manipulate/clean the original files before consumption. Any manipulation will need to be done on the fly in memory. But what if the files are large (eg, I am currently testing with some files that are 5m+ records).
I am using CsvHelper
I have already referred to the following threads for guidance:
CSVHelper to skip record before header
Better way to skip extraneous lines at the start?
How to read a header from a specific line with CsvHelper?
What I would like to do is:
If I need perform a combination of stream manipulation before I pass this into the CsvHelper, then do also let me know if that's the missing piece? (and any assistance on how I can actually achieve that under one block of code with be greatly appreciated)
So far I have come up with the below:
string filepath = Path.Combine(txtTst04_File_Location.Text, txtTst04_File_Name.Text);
using (var reader = new StreamReader(filepath))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
// skip rows to get the header
for (int i = 0; i < 4; i++)
{
csv.Read();
}
csv.Configuration.Delimiter = "|"; // Set delimiter
csv.Configuration.IgnoreBlankLines = false;
csv.Configuration.HasHeaderRecord = true;
// how do I set the row where the actual data starts?
using (var dr = new CsvDataReader(csv))
{
var dt = new DataTable();
dt.Load(dr);
dgvTst04_View.DataSource = dt; // Set datagridview source to datatable
}
}
I get the below result:
Do let me know if you would like me to expand on any point.
thanks!
EDIT:
New linked post created here trying to resolve the same objective, but in a different way but getting a new error: Filestream and datagridview memory issue with CsvHelper
Upvotes: 0
Views: 5174
Reputation: 9104
I came up with another approach that allows you to skip the lines to the header and then to the records.
using (var reader = new StreamReader(filepath))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Configuration.Delimiter = "|"; // Set delimiter
csv.Configuration.IgnoreBlankLines = false;
// skip to header
for (int i = 0; i < 3; i++)
{
csv.Read();
}
csv.ReadHeader();
var headers = csv.Context.HeaderRecord;
// skip to records
for (int i = 0; i < 6; i++)
{
csv.Read();
}
var dt = new DataTable();
foreach (var header in headers)
{
dt.Columns.Add(header);
}
while (csv.Read())
{
var row = dt.NewRow();
for (int i = 0; i < headers.Length; i++)
{
row[i] = csv.GetField(i);
}
dt.Rows.Add(row);
}
}
Upvotes: 0
Reputation: 9104
I can get it to work with ShouldSkipRecord
. The only problem is it will fail if any of the random lines has a "|" delimiter in it.
using (var reader = new StreamReader(filepath))
using (var csv = new CsvReader(reader, CultureInfo.InvariantCulture))
{
csv.Configuration.Delimiter = "|"; // Set delimiter
csv.Configuration.ShouldSkipRecord = row => row.Length == 1;
using (var dr = new CsvDataReader(csv))
{
var dt = new DataTable();
dt.Load(dr);
dgvTst04_View.DataSource = dt; // Set datagridview source to datatable
}
}
If you know how many columns there are, you could set it to skip any rows that have less than that many columns.
csv.Configuration.ShouldSkipRecord = row => row.Length < 6;
Upvotes: 1