Meelfan Bmfp
Meelfan Bmfp

Reputation: 605

Trying to query many text files in the same folder with linq

I need to search a folder containing csv files. The records i'm interested in have 3 fields: Rec, Country and Year. My job is to search the files and see if any of the files has records for more then a single year. Below the code i have so far:

// Get each individual file from the folder.

 string startFolder = @"C:\MyFileFolder\";
    System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);
    IEnumerable<System.IO.FileInfo> fileList = dir.GetFiles("*.*",
    System.IO.SearchOption.AllDirectories);
    var queryMatchingFiles =
            from file in fileList
            where (file.Extension == ".dat" || file.Extension == ".csv")
        select file;

Then i'm came up with this code to read year field from each file and find those where year count is more than 1(The count part was not successfully implemented)

public  void GetFileData(string filesname, char sep)
    {
    using (StreamReader reader = new StreamReader(filesname))
    {
        var recs = (from line in reader.Lines(sep.ToString())
            let parts = line.Split(sep)
             select       parts[2]);
            }

below a sample file:

 REC,IE,2014

 REC,DE,2014

 REC,FR,2015

Now i'am struggling to combine these 2 ideas to solve my problem in a single query. The query should list those files that have record for more than a year.

Thanks in advance

Upvotes: 1

Views: 550

Answers (3)

Milen
Milen

Reputation: 8867

Something along these lines:

string startFolder = @"C:\MyFileFolder\";
    System.IO.DirectoryInfo dir = new System.IO.DirectoryInfo(startFolder);
    IEnumerable<System.IO.FileInfo> fileList = dir.GetFiles("*.*",
    System.IO.SearchOption.AllDirectories);
    var fileData =
            from file in fileList
            where (file.Extension == ".dat" || file.Extension == ".csv")
        select GetFileData(file, ',')
;

public  string GetFileData(string filesname, char sep)
    {
       using (StreamReader reader = new StreamReader(filesname))
       {
        var recs = (from line in reader.Lines(sep.ToString())
            let parts = line.Split(sep)
             select       parts[2]);
        var multipleyears = recs.Distinct().Count();
        if(multipleyears > 1)
        return filename;
        }
    }

Upvotes: 1

Harald Coppoolse
Harald Coppoolse

Reputation: 30464

"My job is to search the files and see if any of the files has records for more then a single year."

This specifies that you want a Boolean result, one that says if any of the files has those records.

For fun I'll extend it a little bit more:

My job is to get the collection of files where any of the records is about more than a single year.

You were almost there. Let's first declare a class with the records in your file:

public class MyRecord
{
    public string Rec { get; set; }
    public string CountryCode { get; set; }
    public int Year { get; set; }
}

I'll make an extension method of the class FileInfo that will read the file and returns the sequence of MyRecords that is in it.

For extension methods see MSDN Extension Methods (C# Programming Guide)

public static class FileInfoExtension
{
    public static IEnumerable<MyRecord> ReadMyRecords(this FileInfo file, char separator)
    {
        var records = new List<MyRecord>();
        using (var reader = new StreamReader(file.FullName))
        {
            var lineToProcess = reader.ReadLine();
            while (lineToProcess != null)
            {
                var splitLines = lineToProcess.Split(new char[] { separator }, 3);
                if (splitLines.Length < 3) throw new InvalidDataException();
                var record = new MyRecord()
                {
                    Rec = splitLines[0],
                    CountryCode = splitLines[1],
                    Year = Int32.Parse(splitLines[2]),
                };
                records.Add(record);
                lineToProcess = reader.ReadLine();
            }
        }
        return records;
    }
}

I could have used string instead of FileInfo, but IMHO a string is something completely different than a filename.

After the above you can write the following:

string startFolder = @"C:\MyFileFolder\";
var directoryInfo = new DirectoryInfo(startFolder);
var allFiles = directoryInfo.EnumerateFiles("*.*", SearchOption.AllDirectories);
var sequenceOfFileRecordCollections = allFiles.ReadMyRecords(',');

So now you have per file a sequence of the MyRecords in the file. You want to know which files have more than one year, Let's add another extension method to class FileInfoExtension:

public static bool IsMultiYear(this FileInfo file, char separator)
{
    // read the file, only return true if there are any records,
    // and if any record has a different year than the first record
    var myRecords = file.ReadMyRecords(separator);
    if (myRecords.Any())
    {
        int firstYear = myRecords.First().Year;
        return myRecords.Any(record => record.Year != firstYear);
    }
    else
    {
        return false;
    }
}

The sequence of file that have more than one year in it is:

allFiles.Where(file => file.IsMultiYear(',');

Put everything in one line:

var allFilesWithMultiYear = new DirectoryInfo(@"C:\MyFileFolder\")
    .EnumerateFiles("*.*", SearchOption.AllDirectories)
    .Where(file => file.IsMultiYear(',');

By creating two fairly simple extension methods your problem became one highly readable statement.

Upvotes: 1

Noctis
Noctis

Reputation: 11763

Not on my develop machine, so this might not compile "as is", but here's a direction

var lines = // file.readalllines();
var years = from line in lines 
           let parts = line.Split(new [] {','})
           select       parts[2]);            
var distinct_years = years.Distinct();
if (distinct_years >1 )
    // this file has several years

Upvotes: 1

Related Questions