Zuzlx
Zuzlx

Reputation: 1266

LINQ way to change my inefficient program

Please consider the following records

enter image description here

I'm trying to flatten and group the data by Robot Name then by date + Left Factory time then group the address(es) for that date and time. Notice that some of the Left Factory times are identical.

I wrote the code below and it works. It gives me the output that I want. I was a Perl developer so what you see below is from that mentality. I'm sure there is a better way of doing it in LINQ. A little help please.

static void Main(string[] args)
{
    if (args.Length < 0){
        Console.WriteLine("Input file name is required");
        return;
    }


  List<string>  rawlst = File.ReadAllLines(args[0]).ToList<string>();

    Dictionary<string, Dictionary<DateTime, List<string>>> dicDriver = new Dictionary<string, Dictionary<DateTime, List<string>>>();



    foreach (string ln in rawlst)
    {
        try
        {
            List<string> parts = new List<string>();
            parts = ln.Split(',').ToList<string>();

            string[] dtparts = parts[1].Split('/');
            string[] dttime = parts[15].Split(':');
            DateTime dtrow = new DateTime(
                int.Parse(dtparts[2]), int.Parse(dtparts[0]), int.Parse(dtparts[1]),
                int.Parse(dttime[0]), int.Parse(dttime[1]), int.Parse(dttime[2]));

            string rowAddress = parts[7] + " " + parts[9] + " " + parts[10] + " " + parts[11];

            if (!dicDriver.Keys.Contains(parts[3]))
            {
                Dictionary<DateTime, List<string>> thisRec = new Dictionary<DateTime, List<string>>();
                thisRec.Add(dtrow, new List<string>() { rowAddress });

                dicDriver.Add(parts[3], thisRec);

            }
            else
            {
                Dictionary<DateTime, List<string>> thisDriver = new Dictionary<DateTime, List<string>>();

                thisDriver = dicDriver[parts[3]];
                if (!thisDriver.Keys.Contains(dtrow))
                {
                    dicDriver[parts[3]].Add(dtrow, new List<string>() { rowAddress });
                }
                else
                {
                    dicDriver[parts[3]][dtrow].Add(rowAddress);
                }
            }
        }
        catch (Exception e)
        {
            Console.WriteLine("ERROR:" + ln);
        }
    }

 //output
    string filename = DateTime.Now.Ticks.ToString() + ".out";
    foreach (var name in dicDriver.Keys)
    {
        foreach (var dd in dicDriver[name])
        {

                Console.Write(name + "," + dd.Key + ",");
            File.AppendAllText(filename, name + "," + dd.Key + Environment.NewLine);
            foreach (var addr in dd.Value)
            {
                Console.Write("\t\t" + addr + Environment.NewLine);
                File.AppendAllText(filename, "\t" + addr + Environment.NewLine);
            }

        }
        Console.Write(Environment.NewLine);
        File.AppendAllText(filename, Environment.NewLine);
    }

    Console.ReadLine();   
}

Upvotes: 1

Views: 61

Answers (1)

Harald Coppoolse
Harald Coppoolse

Reputation: 30512

You should separate your concerns: separate your input from the processing and from the output.

For example: suppose you would have to read your input from a database instead from a CSV file? Would that seriously change the way your process your fetched data? In your design, fetching the data is mixed with processing: although you know that the data that you want to process contains something like FactoryProcesses, you decide to present eache FactoryProcess as a string. A FactoryProcess is not a string. It describes how and when and who processed something in your factory. That is not a string, is it? However, it might be represented internally as a string, but the outside world should not know this. This way, if you change your FactoryProcess from being read by a CSV-file, to something provided by a database, the users of your FactoryProcess won't see any difference.

Separation of concerns makes your code easier to understand, easier to test, easier to change, and better to re-use.

So let's separate!

IEnumerable<FactoryProcess> ReadFactoryProcesses(string fileName)
{
     // TODO: check fileName not null, file exists
     using (var fileReader = new StreamReader(fileName))
     {
         // read the file Line by Line and split each line into one FactoryProcess object
         string line = fileReader.ReadLine();
         while (line != null)
         {
              // one line read, convert to FactoryProcess and yield return:
              FactoryProcess factoryProcess = this.ToFactoryProcess(line);
              yield return factoryProcess;

              // read next line:
              line = fileReader.ReadLine();
         }
    }               
}

I'll leave the conversion of a read line into a FactoryProcess up to you. Tip: if the items in your lines are separated by a comma, or something similar, consider using Nuget Package CSVHelper. It makes if easier to convert a file into a sequence of FactoryProcesses.

I want to group the data by Robot Name then by date + Left Factory time then group the address(es) for that date and time.

First of all: make sure that the FactoryProcess class has the properties you actually need. Separate this representation from what it is in a file. Apparently you want to tread date + left factory as one item that represents the Date and Time that it left the factory. So Let's create a DateTime property for this.

class FactoryProcess
{
    public int Id {get; set}
    public int PartNo {get; set;}
    public string RobotName {get; set;}      // or if desired: use a unique RobotId
    ...

    // DateTimes: ArrivalTime, OutOfFactoryTime, LeftFactoryTime
    public DateTime ArrivalTime {get; set;}
    public DateTime OutOfFactoryTime {get; set;}
    public DateTime LeftFactoryTime {get; set;}

}

The reason that I put Date and Time into one DateTime, is because it will solve problems if an item arrives on 23:55 and leaves on 00:05 next day.

A procedure that converts a read CSV-line to a FactoryProcess should interpret your dates and times as strings and convert to FactoryProcess. You can create a constrcuctor for this, or a special Factory class

public FactoryProcess InterpretReadLine(string line)
{
    // TODO: separate the parts, such that you've got the strings dateTxt, arrivalTimeTxt, ...
     DateTime date = DateTime.Parse(dateTxt);
     TimeSpan arrivalTime = TimeSpan.Parse(arrivalTimeTxt);
     TimeSpan outOfFactoryTime = TimeSpan.Parse(outOfFactoryTimeTxt);
     TimeSpan leftFactoryTime = TimeSpan.Parse(leftFactoryTimeTxt);

    return new FactoryProces
    {
        Id = ...
        PartNo = ..
        RobotName = ...

        // The DateTimes:
        ArrivalTime = date + arrivalTime,
        OutOfFactoryTime = date + outOfFactoryTime,
        LeftFactoryTime = date + leftFactoryTime,
    };
}

Now that you've created a proper method to convert your CSV-file into a sequence of FactoryProcesses, let's process them

I want to group the data by Robot Name then by date + Left Factory time then group the address(es) for that date and time.

var result = fetchedFactoryProcesses.GroupBy(

    // parameter KeySelector: make groups of FactoryProcesses with same RobotName:
    factoryProcess => factoryProcess.RobotName,

    // parameter ResultSelector: from every group of FactoryProcesses with this RobotName
    // make one new Object:
    (robotName, processesWithThisRobotName) => new
    {
        RobotName = robotName,

        // Group all processes with this RobotName into groups with same LeftFactoryTime:
        LeftFactory = processesWithThisRobotName.GroupBy(

            // KeySelector: make groups with same LeftFactoryTime
            process => process.LeftFactoryTime,

            // ResultSelector: from each group of factory processes with the same LeftFactoryTime
            (leftFactoryTime, processesWithThisLeftFactoryTime) => new
            {
                LeftFactoryTime = leftFactoryTime,
                FactoryProcesses = processesWithThisLeftFactoryTime,

                // or even better: select only the properties you actually plan to use
                FactoryProcesses = processesWithThisLeftFactoryTime.Select(process => new
                {
                     Id = process.Id,
                     PartNo = process.PartNo,
                     ...

                     // not needed: you know the value, because it is in this group
                     // RobotName = process.RobotName,
                     // LeftFactoryTime = process.LeftFactoryTime,
                }),
            })
});

For completeness: grouping your code together:

void ProcessData(string fileName)
{
     var fetchedFactoryProcesses = ReadFactoryProcess(fileName);  // fetch the data
     var groups = fetchFactoryProcesses.ToGroups();               // put into groups
     this.Display(groups);                                        // output result;
}

Because I separated the input from the conversion of strings to FactoryProcesses, and separated this conversion from the grouping, it will be easy to test the classes separately:

  • your CSV-reader should return any file that is divided into lines, even if it does not contain FactoryProcesses
  • your conversion from read line to FactoryProcess should convert any string that is in the proper format, whether it is read from a file or gathered any other way
  • your grouping should group any sequence of FactoryProcesses, whether they come from a CSV-file or from a database or List<FactoryProcess>, which is very convenient, because in your tests it is way easier to create a test list, than a test CSV-file.

If in future you decide to change the source of your sequence of FactoryProcesses, for instance it comes from a database instead of a CSV-file, your grouping won't change. Or if you decide to support entering and leaving factories on different dates (multiple date values) only the conversion changes. If you decide to display the results in a tree-like fashion, or decide to write the groups in a database, your reading, conversion, grouping, etc won't change: what a high degree or re-usability!

Separating your concerns made it much easier to understand how to solve your grouping problem, without the hassle of splitting your read lines and converting Date + LeftFactory into one value.

Upvotes: 1

Related Questions