Divi
Divi

Reputation: 7691

Validate CSV file

I have a webpage that is used to submit a CSV file to the server. I have to validate the file, for stuff like correct number of columns, correct data type, cross field validations, data-range validations, etc. And finally either show a successful message or return a CSV with error messages and line numbers.

Currently every row and every column is looped through to find out all the errors in the CSV file. But it becomes very slow for bigger files, sometimes resulting in a server time-out. Can someone please suggest a better way to do this.

Thanks

Upvotes: 1

Views: 8450

Answers (5)

user584539
user584539

Reputation:

You could use XMLReader and parse against an XSD

Upvotes: 0

Shekhar_Pro
Shekhar_Pro

Reputation: 18430

To validate a CSV file you will surely need to check each column. The only best way if possible in your scenario is to validate the entry itself while appending to the CSV file..


Edit

As pinpointed an error by @accolaum, i have edited my code

It will only work provided each row is delimited with a `\n`

IF you only want to Validate number of Columns.. then its easier.. Just take the mod of all the entries with the num of columns

bool file_isvalid;
string data = streamreader.ReadLine();
while(data != null)
{
    if(data.Split(',').Length % Num_Of_Columns == 0)
    {
        file_isvalid = true;
        //Perform opertaion
    }
    else
    {
        file_isvalid = false;
        //Perform Operation
    }
    data = streamreader.ReadLine();
}

Hope it helps

Upvotes: 2

djeeg
djeeg

Reputation: 6765

Validation of csv data usually always needs to look at every single cell. Can you post some of your code, there may be ways to optimse it.

EDIT

in most cases this is the best solution

foreach(row) {
    foreach (column) {
        validate cell
    }
}

if you were really keen, you could try something with regex's

foreach(row) {
    validate row by regex
}

but then you are really just off loading the validation code from you to the regex, and i really hate using regexs

Upvotes: 0

schoetbi
schoetbi

Reputation: 12856

I would suggest a rule based approach, similar to unit tests. Think of every! error that can possibly occour and order them in increasing abstraction level

  • Correct file encoding
  • Correct number of lines/columns
  • correct column headers
  • correct number/text/date formats
  • correct number ranges
  • bussiness rules??
  • ...

These rules could also have automatic fixes. So if you could automatically detect the encoding, you could correct it before testing all the rules.

Implementation could be done using the command pattern

public abstract class RuleBase
{
  public abstract bool Test();
  public virtual bool CanCorrect()
  { 
     return false;
  }
}

Then create a subclass for each test you want to make and put them in a list.

The timeout can be overcome by using a background thread only for test incoming files. The user has to wait till his file is validated and becomes "active". When finished you can forward him to the next page.

Upvotes: 1

Kirk Broadhurst
Kirk Broadhurst

Reputation: 28718

You may be able to optimize your code to perform faster, but what you really want to do is to spawn a worker thread to do the processing.

Two benefits of this

  • You can redirect the user to another page so that they know their request has submitted
  • The worker thread can be given a callback so that it can report its status - if you want to, you could put a progress bar or a percentage on the 'submitted' page so that the user can see as their file is being processed.

It is not good design to have the user waiting for long running processes to complete - they should be given updates or notifications, rather than just a 'loading' icon on their browser.

edit: This is my answer because (1) I can't recommend code improvements without seeing your code, and (2) efficiency improvements are probably only going to yield incremental improvements (unless you are doing something really wrong), which won't solve your problem long term.

Upvotes: 1

Related Questions