Counting duplicate lines from text files using C#

Question

Hey guys I'm working on a program to take information from a text file and output the information in a CSV file, one thing I need to do is complement a count of the duplicate files (Where possible, duplicate records of an offense charged against an individual should be aggregated into a single record with a addition field called "counts" that indicates the number of duplicate records found (for non-duplicate records, this field should be set to zero).). I've been having a little bit of trouble adding the counter and was wondering if you guys had any advice for me.

Thank you

using System;
using System.IO;
using System.Linq;
using System.Collections.Generic;
using System.Text;

namespace finalproj
{
    class Program
    {
        static void Main(string[] args)
        {
            StreamReader reader = new StreamReader("DISTRICT.DISTRICT_COURT_.11.13.18.AM.000B.CAL.txt");

            StreamWriter writer = new StreamWriter("outtext.csv");

            int counts;
            string line = "";

            for (int x = 0; x < 1; x++)
            {
                string buffer = reader.ReadLine();
                line += " " + buffer;
            }

            //StreamWriter writer = new StreamWriter("outtext.csv");
            //writer.WriteLine(line);
            //writer.Close();

            //Console.WriteLine(line);

            while (line != null)
            {
                if (line.Contains("APT."))
                {
                    Console.WriteLine(line);
                }
                else if (line.Contains("BPD"))
                {
                    Console.WriteLine(line);
                }
                else if (line.Contains("18IF"))
                {
                    Console.WriteLine(line);
                }
                else if (line.Contains("SHP"))
                {
                    Console.WriteLine(line);
                }
                else if (line.Contains("SFF"))
                {
                    Console.WriteLine(line);
                }
                else if (line.Contains("CLS:"))
                {
                    Console.WriteLine(line);
                }
                else if (line.Contains("BOND"))
                {
                    Console.WriteLine(line);
                }
                else if (line.Contains("ATTY"))
                {
                    Console.WriteLine(line);
                }
                else if (line.Contains("(T)"))
                {
                    Console.WriteLine(line);
                }
                else if (line.Contains("(M)"))
                {
                    Console.WriteLine(line);
                }
                else if (line.Contains("(F)"))
                {
                    Console.WriteLine(line);
                }
                else if (line.Contains("(I)"))
                {
                    Console.WriteLine(line);
                }


                line = reader.ReadLine();
                writer.WriteLine(line);
            }


            writer.WriteLine(line);

            reader.Close();
            writer.Close();
            Console.WriteLine(line);


            //using (reader)
            //{
            //    
            //string line1;
            //string[] split = new
            //    while((line1 = reader.ReadLine()) !=null)
            //    {
            //        string[] split = 
            //    }
            //}

            Console.ReadKey();
        }
    }
}

Anu Viswan · Accepted Answer

To split lines and count occurrences, you can Split using NewLine and use Linq

string[] lines = str.Split(new[] { Environment.NewLine },StringSplitOptions.None);
var result = lines.GroupBy(g => g)
            .Select(s => new { Key = s.Key, Count = s.Count()})
            .ToDictionary(d => d.Key, d => d.Count);

The result would have lines that has single occurrence. If you want only duplicate lines

var result = lines.GroupBy(g => g).Where(x=> x.Count()>1)
            .Select(s => new { Key = s.Key, Count = s.Count()})
            .ToDictionary(d => d.Key, d => d.Count);

You can then write the CSV directly from the dictionary

File.WriteAllLines(filePath, result.Select(x=>$"{x.Key},{x.Value},"));

Counting duplicate lines from text files using C#

Answers (2)

Related Questions