Reputation: 7691
I have a webpage that is used to submit a CSV file to the server. I have to validate the file, for stuff like correct number of columns, correct data type, cross field validations, data-range validations, etc. And finally either show a successful message or return a CSV with error messages and line numbers.
Currently every row and every column is looped through to find out all the errors in the CSV file. But it becomes very slow for bigger files, sometimes resulting in a server time-out. Can someone please suggest a better way to do this.
Thanks
Upvotes: 1
Views: 8450
Reputation: 18430
To validate a CSV file you will surely need to check each column. The only best way if possible in your scenario is to validate the entry itself while appending to the CSV file..
As pinpointed an error by @accolaum, i have edited my code
It will only work provided each row is delimited with a `\n`
IF you only want to Validate number of Columns.. then its easier.. Just take the mod of all the entries with the num of columns
bool file_isvalid;
string data = streamreader.ReadLine();
while(data != null)
{
if(data.Split(',').Length % Num_Of_Columns == 0)
{
file_isvalid = true;
//Perform opertaion
}
else
{
file_isvalid = false;
//Perform Operation
}
data = streamreader.ReadLine();
}
Hope it helps
Upvotes: 2
Reputation: 6765
Validation of csv data usually always needs to look at every single cell. Can you post some of your code, there may be ways to optimse it.
EDIT
in most cases this is the best solution
foreach(row) {
foreach (column) {
validate cell
}
}
if you were really keen, you could try something with regex's
foreach(row) {
validate row by regex
}
but then you are really just off loading the validation code from you to the regex, and i really hate using regexs
Upvotes: 0
Reputation: 12856
I would suggest a rule based approach, similar to unit tests. Think of every! error that can possibly occour and order them in increasing abstraction level
These rules could also have automatic fixes. So if you could automatically detect the encoding, you could correct it before testing all the rules.
Implementation could be done using the command pattern
public abstract class RuleBase
{
public abstract bool Test();
public virtual bool CanCorrect()
{
return false;
}
}
Then create a subclass for each test you want to make and put them in a list.
The timeout can be overcome by using a background thread only for test incoming files. The user has to wait till his file is validated and becomes "active". When finished you can forward him to the next page.
Upvotes: 1
Reputation: 28718
You may be able to optimize your code to perform faster, but what you really want to do is to spawn a worker thread to do the processing.
Two benefits of this
It is not good design to have the user waiting for long running processes to complete - they should be given updates or notifications, rather than just a 'loading' icon on their browser.
edit: This is my answer because (1) I can't recommend code improvements without seeing your code, and (2) efficiency improvements are probably only going to yield incremental improvements (unless you are doing something really wrong), which won't solve your problem long term.
Upvotes: 1