Reputation: 2271
I'm working on a program that reformats CSV files using C#. It imports a CSV and uses certain columns to be represented in a new CSV file. I'm getting a System.IndexOutOfRangeException exception using this code.
using System;
using System.Collections;
using System.Linq;
class CSVFiles
{
static void Main(string[] args)
{
// Create the IEnumerable data source
string[] lines = System.IO.File.ReadAllLines(@"presta.csv");
// Create the query. Put field 2 first, then
// reverse and combine fields 0 and 1 from the old field
IEnumerable query =
from line in lines
let x = line.Split(';')
select x[0] + ", base, 0, " + x[0] + ", " + x[7] + ", " + x[1] + ", " + x[2] + ", " + x[3] + ", " + x[15] + ", " + x[4] + ", " + x[6] + ", " + x[7] + ", Sí, " + x[12] + ", " + x[12] + ", " + x[12] + ", " + x[12];
// Execute the query and write out the new file. Note that WriteAllLines
// takes a string[], so ToArray is called on the query.
System.IO.File.WriteAllLines(@"outlet.csv", query.Cast<String>().ToArray());
Console.WriteLine("outlet.csv written to disk. Press any key to exit");
Console.ReadKey();
}
}
The imported CSV has 16 columns, so it should be indexed to x[17]. Can anyone help me on this? Or maybe there's another way to do this that is better?
Here is the entire debug output:
'CSVConverter.vshost.exe' (Managed (v4.0.30319)): Loaded 'C:\Windows\Microsoft.Net\assembly\GAC_32\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll'
'CSVConverter.vshost.exe' (Managed (v4.0.30319)): Loaded 'C:\Windows\assembly\GAC_MSIL\Microsoft.VisualStudio.HostingProcess.Utilities\11.0.0.0__b03f5f7f11d50a3a\Microsoft.VisualStudio.HostingProcess.Utilities.dll'
'CSVConverter.vshost.exe' (Managed (v4.0.30319)): Loaded 'C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Windows.Forms\v4.0_4.0.0.0__b77a5c561934e089\System.Windows.Forms.dll'
'CSVConverter.vshost.exe' (Managed (v4.0.30319)): Loaded 'C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Drawing\v4.0_4.0.0.0__b03f5f7f11d50a3a\System.Drawing.dll'
'CSVConverter.vshost.exe' (Managed (v4.0.30319)): Loaded 'C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System\v4.0_4.0.0.0__b77a5c561934e089\System.dll'
'CSVConverter.vshost.exe' (Managed (v4.0.30319)): Loaded 'C:\Windows\assembly\GAC_MSIL\Microsoft.VisualStudio.HostingProcess.Utilities.Sync\11.0.0.0__b03f5f7f11d50a3a\Microsoft.VisualStudio.HostingProcess.Utilities.Sync.dll'
'CSVConverter.vshost.exe' (Managed (v4.0.30319)): Loaded 'C:\Windows\assembly\GAC_MSIL\Microsoft.VisualStudio.Debugger.Runtime\11.0.0.0__b03f5f7f11d50a3a\Microsoft.VisualStudio.Debugger.Runtime.dll'
'CSVConverter.vshost.exe' (Managed (v4.0.30319)): Loaded 'c:\users\daniel\documents\visual studio 2012\Projects\CSVConverter\CSVConverter\bin\Debug\CSVConverter.vshost.exe'
'CSVConverter.vshost.exe' (Managed (v4.0.30319)): Loaded 'C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Core\v4.0_4.0.0.0__b77a5c561934e089\System.Core.dll'
'CSVConverter.vshost.exe' (Managed (v4.0.30319)): Loaded 'C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Xml.Linq\v4.0_4.0.0.0__b77a5c561934e089\System.Xml.Linq.dll'
'CSVConverter.vshost.exe' (Managed (v4.0.30319)): Loaded 'C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Data.DataSetExtensions\v4.0_4.0.0.0__b77a5c561934e089\System.Data.DataSetExtensions.dll'
'CSVConverter.vshost.exe' (Managed (v4.0.30319)): Loaded 'C:\Windows\Microsoft.Net\assembly\GAC_MSIL\Microsoft.CSharp\v4.0_4.0.0.0__b03f5f7f11d50a3a\Microsoft.CSharp.dll'
'CSVConverter.vshost.exe' (Managed (v4.0.30319)): Loaded 'C:\Windows\Microsoft.Net\assembly\GAC_32\System.Data\v4.0_4.0.0.0__b77a5c561934e089\System.Data.dll'
'CSVConverter.vshost.exe' (Managed (v4.0.30319)): Loaded 'C:\Windows\Microsoft.Net\assembly\GAC_MSIL\System.Xml\v4.0_4.0.0.0__b77a5c561934e089\System.Xml.dll'
The thread 'vshost.NotifyLoad' (0x52c) has exited with code 0 (0x0).
The thread 'vshost.LoadReference' (0x6cc) has exited with code 0 (0x0).
'CSVConverter.vshost.exe' (Managed (v4.0.30319)): Loaded 'c:\users\daniel\documents\visual studio 2012\Projects\CSVConverter\CSVConverter\bin\Debug\CSVConverter.exe', Symbols loaded.
A first chance exception of type 'System.IndexOutOfRangeException' occurred in CSVConverter.exe
An unhandled exception of type 'System.IndexOutOfRangeException' occurred in CSVConverter.exe
Additional information: Index was outside the bounds of the array.
The program '[6952] CSVConverter.vshost.exe: Managed (v4.0.30319)' has exited with code -1073741510 (0xc000013a).
Upvotes: 0
Views: 1499
Reputation: 450
I'm not sure I understood, but I'll give it my best shot. If you have 16 items in an array, then the index of the last item in the array would be x[15] since arrays in most languages start counting at 0 instead of 1. The index of the first item in the array is x[0].
Another thing I might add is that it looks like you are getting an array, turning it into an IEnumerable, then turning it back into an array without using any of the fancy stuff IEnumerable provides. I'd suggest using a foreach loop for this task instead.
Best of luck and hopefully this helped!
Upvotes: 0
Reputation: 74307
Reading delimited text files is not as simple as it might first appear.
If your semicolon-delimited file has 16 columns, the array resulting from splitting a line should be of length 16 (meaning the highest offset into the array is +15). It might be less, if any of the following is true for any line in the source data:
You might wind up with more columns than you think, too. The primary reason for this is that data being tainted with the impurity of the world, as it is, is often unclean. People have been know to litter data with delimiter characters, such as commas or semicolons. When you do a naive Split()
on the text, you don't always get what you want. And this is especially true for "CSV" files, the format being rather [cough] loosely defined. And even more loosely implemented.
You might want to look at using Sebastien Lorion's Fast CSV Reader from CodeProject for this. It works quite well and takes care of a lot of the...unexpected cases you might encounter.
Other references you might want to take a look at:
Edited to note: The Library of Congress seems to have weighed in on the CSV format as well: http://www.digitalpreservation.gov/formats/fdd/fdd000323.shtml
Upvotes: 0
Reputation: 16296
You may have an extra line break (esp. in the end of the file) which gives a blank string. To work around you can modify your where
condition as:
from line in lines
where !String.IsNullOrEmpty(line)
...
Upvotes: 0
Reputation: 48114
You said "The imported CSV has 16 columns, so it should be indexed to x[17]." that is wrong. Arrays are 0 indexed so if the CSV has 16 columns x[15] will be the final column. Any index larger than that will give an out of bounds exception.
EDIT: looking at your code I noticed that you don't actually try to access anything beyond the final index so the first issue probably isn't responsible for your crash; here's another suggestion. Add some bounds checking. I would assume that the Split
in your LINQ query is splitting an incomplete line and then you try to access indexes that don't exist (ie the line only has 4 items on it and should be ignored but your code just assumes that it has 16 and tries to access an index that is out of range in the error'd line). If you split a line and are going to access indexes between 0 and n, check to make sure the array length is greater than n before doing so.
Upvotes: 1