Reputation: 605
I need to search a folder containing csv files. The records i'm interested in have 3 fields: Rec, Country and Year. My job is to search the files and see if any of the files has records for more then a single year. Below the code i have so far:
// Get each individual file from the folder.
string startFolder = @"C:\MyFileFolder\";
System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);
IEnumerable<System.IO.FileInfo> fileList = dir.GetFiles("*.*",
System.IO.SearchOption.AllDirectories);
var queryMatchingFiles =
from file in fileList
where (file.Extension == ".dat" || file.Extension == ".csv")
select file;
Then i'm came up with this code to read year field from each file and find those where year count is more than 1(The count part was not successfully implemented)
public void GetFileData(string filesname, char sep)
{
using (StreamReader reader = new StreamReader(filesname))
{
var recs = (from line in reader.Lines(sep.ToString())
let parts = line.Split(sep)
select parts[2]);
}
below a sample file:
REC,IE,2014
REC,DE,2014
REC,FR,2015
Now i'am struggling to combine these 2 ideas to solve my problem in a single query. The query should list those files that have record for more than a year.
Thanks in advance
Upvotes: 1
Views: 550
Reputation: 8867
Something along these lines:
string startFolder = @"C:\MyFileFolder\";
System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);
IEnumerable<System.IO.FileInfo> fileList = dir.GetFiles("*.*",
System.IO.SearchOption.AllDirectories);
var fileData =
from file in fileList
where (file.Extension == ".dat" || file.Extension == ".csv")
select GetFileData(file, ',')
;
public string GetFileData(string filesname, char sep)
{
using (StreamReader reader = new StreamReader(filesname))
{
var recs = (from line in reader.Lines(sep.ToString())
let parts = line.Split(sep)
select parts[2]);
var multipleyears = recs.Distinct().Count();
if(multipleyears > 1)
return filename;
}
}
Upvotes: 1
Reputation: 30464
"My job is to search the files and see if any of the files has records for more then a single year."
This specifies that you want a Boolean result, one that says if any of the files has those records.
For fun I'll extend it a little bit more:
My job is to get the collection of files where any of the records is about more than a single year.
You were almost there. Let's first declare a class with the records in your file:
public class MyRecord
{
public string Rec { get; set; }
public string CountryCode { get; set; }
public int Year { get; set; }
}
I'll make an extension method of the class FileInfo that will read the file and returns the sequence of MyRecords that is in it.
For extension methods see MSDN Extension Methods (C# Programming Guide)
public static class FileInfoExtension
{
public static IEnumerable<MyRecord> ReadMyRecords(this FileInfo file, char separator)
{
var records = new List<MyRecord>();
using (var reader = new StreamReader(file.FullName))
{
var lineToProcess = reader.ReadLine();
while (lineToProcess != null)
{
var splitLines = lineToProcess.Split(new char[] { separator }, 3);
if (splitLines.Length < 3) throw new InvalidDataException();
var record = new MyRecord()
{
Rec = splitLines[0],
CountryCode = splitLines[1],
Year = Int32.Parse(splitLines[2]),
};
records.Add(record);
lineToProcess = reader.ReadLine();
}
}
return records;
}
}
I could have used string instead of FileInfo, but IMHO a string is something completely different than a filename.
After the above you can write the following:
string startFolder = @"C:\MyFileFolder\";
var directoryInfo = new DirectoryInfo(startFolder);
var allFiles = directoryInfo.EnumerateFiles("*.*", SearchOption.AllDirectories);
var sequenceOfFileRecordCollections = allFiles.ReadMyRecords(',');
So now you have per file a sequence of the MyRecords in the file. You want to know which files have more than one year, Let's add another extension method to class FileInfoExtension:
public static bool IsMultiYear(this FileInfo file, char separator)
{
// read the file, only return true if there are any records,
// and if any record has a different year than the first record
var myRecords = file.ReadMyRecords(separator);
if (myRecords.Any())
{
int firstYear = myRecords.First().Year;
return myRecords.Any(record => record.Year != firstYear);
}
else
{
return false;
}
}
The sequence of file that have more than one year in it is:
allFiles.Where(file => file.IsMultiYear(',');
Put everything in one line:
var allFilesWithMultiYear = new DirectoryInfo(@"C:\MyFileFolder\")
.EnumerateFiles("*.*", SearchOption.AllDirectories)
.Where(file => file.IsMultiYear(',');
By creating two fairly simple extension methods your problem became one highly readable statement.
Upvotes: 1
Reputation: 11763
Not on my develop machine, so this might not compile "as is", but here's a direction
var lines = // file.readalllines();
var years = from line in lines
let parts = line.Split(new [] {','})
select parts[2]);
var distinct_years = years.Distinct();
if (distinct_years >1 )
// this file has several years
Upvotes: 1