Digital
Digital

Reputation: 53

Parsing a delimited file with unescaped multiline fields

I have a CSV file which has a delimiter of "|" to separate the fields.

I am using the code below to read the file and put it into a List

 var reader = new StreamReader(File.OpenRead(openFileDialog1.FileName));
 List<string> list1 = new List<string>();
 List<string> list2 = new List<string>();
 List<string> list3 = new List<string>();
 List<string> list4 = new List<string>();

 while (!reader.EndOfStream)
 {
     var line = reader.ReadLine();
     var values = line.Split('|');

     list1.Add(values[0]);
     list2.Add(values[1]);
     list3.Add(values[2]);
     list4.Add(values[3]);
 }

then I'm gonna put it into a DataSet

DataSet ds = new DataSet();
ds.Tables.Add("barcode");

for (int i = 1; i < list1.Count; i++)
{
    ds.Tables[0].Rows.Add(list1[i], list2[i], list3[i], list4[i]);
}

It's all good IF the data is like this

373|A0000006-04|EACH|2600003347225  
373|A0000006-04|EACH|9556076004684  
373|A0000006-04|EACH|9556076006374  
373|A0000006-04|PK12|2600003347232  
373|A0000006-04|PK12|9556076004691  

However, some of the data might look like this

373|A0000029-01|PK12|1899886
6604250
373|A0000029-01|PK12|2652357563394
373|A0000030-01|EACH|2600001
539189
373|A0000030-01|EACH|8998866604284

As you can see, some of the data are using 2 lines. Is there any ways that I can read them as the same row instead of 2 different rows? Or do I have to put a delimiter such as a comma or semicolon in order to identify them as the same row?

Upvotes: 4

Views: 1486

Answers (4)

Steve
Steve

Reputation: 216293

A List(of T) could be accessed also by index, you could add a lineCounter to your loop and if the line is composed of just one part after splitting, add the content to the previous list element. (At least the first line should be of 4 elements)

lineCounter = 0;
while (!reader.EndOfStream)
{
     var line = reader.ReadLine();
     var values = line.Split('|');

     if(values.Length == 1)
     {
        list4[lineCounter-1] += values[0];
     }
     else
     {
          list1.Add(values[0]);
          list2.Add(values[1]);
          list3.Add(values[2]);
          list4.Add(values[3]);
          lineCounter++;
     }

}

I have tested with sample data provided by the OP, it seems to work well.

Upvotes: 2

jordanhill123
jordanhill123

Reputation: 4182

I've used FileHelpers Library for directly mapping to strong typed arrays. If you are working with formal CSV it will work for you.

If its just delimited data with no formal specifications, you might need some other solution.

Upvotes: 0

z a
z a

Reputation: 1

According to CSV file specification each record should be located on separate line (you can find CSV file spec here http://www.ietf.org/rfc/rfc4180.txt). So in your case you really need to make some sort of workaround and use other separator for marking line breaks.

Upvotes: 0

Giorgi
Giorgi

Reputation: 30873

Use a library such as A Fast CSV Reader which supports all the features you need.

Upvotes: 3

Related Questions