MrPatterns
MrPatterns

Reputation: 4434

Is there a way to check the data in advance of runtime if it's not the correct type?

I occasionally get data that is not completely clean, and during runtime I get error messages because the data doesn't match the expected type. For example, sometimes the data has a string where there should be an int, or an int where there should be a date.

Is there a way to scan the data first for bad data, so that I can fix it all at once instead of finding out during run-time and fixing it iteratively?

Here's my code which works:

class TestScore{
    public string Name;
    public int Age;
    public DateTime Date;
    public DateTime Time;
    public double Score;
}

//read data
var Data = File.ReadLines(FilePath).Select(line => line.Split('\t')).ToArray();

//select data
var query = from x in Data                     
select new { Name = x[3], Age = x[1], Date = x[2], Time = x[5], Score = x[7] };

//create List and put data into List
List<TestScore> Results = new List<TestScore>();

for (int i = 0; i < query.Count; i++)
{
       TestScore TS = new TestScore();

       TS.Name = query[i].Name;
       TS.Age = query[i].Age;
       TS.Date = query[i].Date;
       TS.Time = query[i].Time;
       TS.Score = query[i].Score;

       Results.Add(TS);
}

Upvotes: 0

Views: 133

Answers (2)

Ken Kin
Ken Kin

Reputation: 4703

For not finding out the errors during run-time, the best thing that I can think of would be to correct the data manually before your program runs ..

But as we are trying do things constructive, I think that using a static readonly field to indicate the data error would be helpful. The following is a simple example which doesn't take the failed items, you might want to modify it when you are going to do some advanced handling.

public partial class TestScore {
    public static TestScore Parse(String plainText) {
        var strings=plainText.Split('\t');
        var result=new TestScore();

        if(
            strings.Length<5
            ||
            !double.TryParse(strings[4], out result.Score)
            ||
            !DateTime.TryParse(strings[3], out result.Time)
            ||
            !DateTime.TryParse(strings[2], out result.Date)
            ||
            !int.TryParse(strings[1], out result.Age)
            )
            return TestScore.Error;

        result.Name=strings[0];
        return result;
    }

    public String Name;
    public int Age;
    public DateTime Date;
    public DateTime Time;
    public double Score;

    public static readonly TestScore Error=new TestScore();
}

public static partial class TestClass {
    public static void TestMethod() {
        var path=@"some tab splitted file";

        var lines=File.ReadAllLines(path);

        var format=""
            +"Name: {0}; Age: {1}; "
            +"Date: {2:yyyy:MM:dd}; Time {3:hh:mm}; "
            +"Score: {4}";

        var list=(
            from line in lines
            where String.Empty!=line
            let result=TestScore.Parse(line)
            where TestScore.Error!=result
            select result).ToList();

        foreach(var item in list) {
            Console.WriteLine(
                format,
                item.Name, item.Age, item.Date, item.Time, item.Score
                );
        }
    }
}

Upvotes: 2

Tim M.
Tim M.

Reputation: 54387

Is there a way to scan the data first for bad data, so that I can fix it all at once instead of finding out during run-time and fixing it iteratively?

Scanning is a runtime operation. However, it's fairly straightforward to implement a solution that gives you enough information to "fix it all at once".

The following code shows a pattern for validating the file in its entirety, and doesn't attempt to load any data unless it completely succeeds.

If it fails, a collection of all errors encountered is returned.

internal sealed class ParseStatus
{
    internal bool IsSuccess;
    internal IReadOnlyList<string> Messages;
}

private ParseStatus Load()
{
    string filePath = "foo";

    var data = File.ReadLines( filePath ).Select( line => line.Split( '\t' ) ).ToArray();
    var results = from x in data
                    select new { Name = x[3], Age = x[1], Date = x[2], Time = x[5], Score = x[7] };

    var errors = new List<string>();
    int row = 0;

    // first pass: look for errors by testing each value
    foreach( var line in results )
    {
        row++;

        int dummy;
        if( !int.TryParse( line.Age, out dummy ) )
        {
            errors.Add( "Age couldn't be parsed as an int on line " + row );
        }

        // etc...use exception-free checks on each property
    }

    if( errors.Count > 0 )
    {
        // quit, and return errors list
        return new ParseStatus { IsSuccess = false, Messages = errors };
    }

    // otherwise, it is safe to load all rows

    // TODO: second pass: load the data

    return new ParseStatus { IsSuccess = true };
}

Upvotes: 2

Related Questions