Reputation: 4434
I occasionally get data that is not completely clean, and during runtime I get error messages because the data doesn't match the expected type. For example, sometimes the data has a string where there should be an int, or an int where there should be a date.
Is there a way to scan the data first for bad data, so that I can fix it all at once instead of finding out during run-time and fixing it iteratively?
Here's my code which works:
class TestScore{
public string Name;
public int Age;
public DateTime Date;
public DateTime Time;
public double Score;
}
//read data
var Data = File.ReadLines(FilePath).Select(line => line.Split('\t')).ToArray();
//select data
var query = from x in Data
select new { Name = x[3], Age = x[1], Date = x[2], Time = x[5], Score = x[7] };
//create List and put data into List
List<TestScore> Results = new List<TestScore>();
for (int i = 0; i < query.Count; i++)
{
TestScore TS = new TestScore();
TS.Name = query[i].Name;
TS.Age = query[i].Age;
TS.Date = query[i].Date;
TS.Time = query[i].Time;
TS.Score = query[i].Score;
Results.Add(TS);
}
Upvotes: 0
Views: 133
Reputation: 4703
For not finding out the errors during run-time, the best thing that I can think of would be to correct the data manually before your program runs ..
But as we are trying do things constructive, I think that using a static readonly field to indicate the data error would be helpful. The following is a simple example which doesn't take the failed items, you might want to modify it when you are going to do some advanced handling.
public partial class TestScore {
public static TestScore Parse(String plainText) {
var strings=plainText.Split('\t');
var result=new TestScore();
if(
strings.Length<5
||
!double.TryParse(strings[4], out result.Score)
||
!DateTime.TryParse(strings[3], out result.Time)
||
!DateTime.TryParse(strings[2], out result.Date)
||
!int.TryParse(strings[1], out result.Age)
)
return TestScore.Error;
result.Name=strings[0];
return result;
}
public String Name;
public int Age;
public DateTime Date;
public DateTime Time;
public double Score;
public static readonly TestScore Error=new TestScore();
}
public static partial class TestClass {
public static void TestMethod() {
var path=@"some tab splitted file";
var lines=File.ReadAllLines(path);
var format=""
+"Name: {0}; Age: {1}; "
+"Date: {2:yyyy:MM:dd}; Time {3:hh:mm}; "
+"Score: {4}";
var list=(
from line in lines
where String.Empty!=line
let result=TestScore.Parse(line)
where TestScore.Error!=result
select result).ToList();
foreach(var item in list) {
Console.WriteLine(
format,
item.Name, item.Age, item.Date, item.Time, item.Score
);
}
}
}
Upvotes: 2
Reputation: 54387
Is there a way to scan the data first for bad data, so that I can fix it all at once instead of finding out during run-time and fixing it iteratively?
Scanning is a runtime operation. However, it's fairly straightforward to implement a solution that gives you enough information to "fix it all at once".
The following code shows a pattern for validating the file in its entirety, and doesn't attempt to load any data unless it completely succeeds.
If it fails, a collection of all errors encountered is returned.
internal sealed class ParseStatus
{
internal bool IsSuccess;
internal IReadOnlyList<string> Messages;
}
private ParseStatus Load()
{
string filePath = "foo";
var data = File.ReadLines( filePath ).Select( line => line.Split( '\t' ) ).ToArray();
var results = from x in data
select new { Name = x[3], Age = x[1], Date = x[2], Time = x[5], Score = x[7] };
var errors = new List<string>();
int row = 0;
// first pass: look for errors by testing each value
foreach( var line in results )
{
row++;
int dummy;
if( !int.TryParse( line.Age, out dummy ) )
{
errors.Add( "Age couldn't be parsed as an int on line " + row );
}
// etc...use exception-free checks on each property
}
if( errors.Count > 0 )
{
// quit, and return errors list
return new ParseStatus { IsSuccess = false, Messages = errors };
}
// otherwise, it is safe to load all rows
// TODO: second pass: load the data
return new ParseStatus { IsSuccess = true };
}
Upvotes: 2