Reputation: 23
Hey guys I'm working on a program to take information from a text file and output the information in a CSV file, one thing I need to do is complement a count of the duplicate files (Where possible, duplicate records of an offense charged against an individual should be aggregated into a single record with a addition field called "counts" that indicates the number of duplicate records found (for non-duplicate records, this field should be set to zero).). I've been having a little bit of trouble adding the counter and was wondering if you guys had any advice for me.
Thank you
using System;
using System.IO;
using System.Linq;
using System.Collections.Generic;
using System.Text;
namespace finalproj
{
class Program
{
static void Main(string[] args)
{
StreamReader reader = new StreamReader("DISTRICT.DISTRICT_COURT_.11.13.18.AM.000B.CAL.txt");
StreamWriter writer = new StreamWriter("outtext.csv");
int counts;
string line = "";
for (int x = 0; x < 1; x++)
{
string buffer = reader.ReadLine();
line += " " + buffer;
}
//StreamWriter writer = new StreamWriter("outtext.csv");
//writer.WriteLine(line);
//writer.Close();
//Console.WriteLine(line);
while (line != null)
{
if (line.Contains("APT."))
{
Console.WriteLine(line);
}
else if (line.Contains("BPD"))
{
Console.WriteLine(line);
}
else if (line.Contains("18IF"))
{
Console.WriteLine(line);
}
else if (line.Contains("SHP"))
{
Console.WriteLine(line);
}
else if (line.Contains("SFF"))
{
Console.WriteLine(line);
}
else if (line.Contains("CLS:"))
{
Console.WriteLine(line);
}
else if (line.Contains("BOND"))
{
Console.WriteLine(line);
}
else if (line.Contains("ATTY"))
{
Console.WriteLine(line);
}
else if (line.Contains("(T)"))
{
Console.WriteLine(line);
}
else if (line.Contains("(M)"))
{
Console.WriteLine(line);
}
else if (line.Contains("(F)"))
{
Console.WriteLine(line);
}
else if (line.Contains("(I)"))
{
Console.WriteLine(line);
}
line = reader.ReadLine();
writer.WriteLine(line);
}
writer.WriteLine(line);
reader.Close();
writer.Close();
Console.WriteLine(line);
//using (reader)
//{
//
//string line1;
//string[] split = new
// while((line1 = reader.ReadLine()) !=null)
// {
// string[] split =
// }
//}
Console.ReadKey();
}
}
}
Upvotes: 1
Views: 997
Reputation: 4313
Here you go, I used Regex to match what you look for and used a SordedSet to capture the lines and see if there are duplicates. Be aware, whith big files you might use quite some memory but as it is csv related, I think you are fine:
using System;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;
namespace ConsoleApp4
{
class Program
{
static void Main(string[] args)
{
StreamReader reader = new StreamReader("DISTRICT.DISTRICT_COURT_.11.13.18.AM.000B.CAL.txt");
StreamWriter writer = new StreamWriter("outtext.csv");
int counts = 0;
string line ;
SortedSet<string> uniqueLine = new SortedSet<string>();
Regex findWords = new Regex(@"(APT.|BPD|18IF|SHP|SFF|CLS:|BOND|ATTY|\(T\)|\(M\)|\(F\)|\(I\))");
while ((line = reader.ReadLine()) != null)
{
if (uniqueLine.Contains(line))
{
counts++;
}
else
{
uniqueLine.Add(line);
writer.WriteLine(line);
}
Match aMatch = findWords.Match(line);
if (aMatch.Success)
{
Console.WriteLine(line);
}
}
writer.WriteLine("Count:{0}", counts);
writer.Close();
Console.ReadKey();
}
}
}
Upvotes: 0
Reputation: 18155
To split lines and count occurrences, you can Split using NewLine and use Linq
string[] lines = str.Split(new[] { Environment.NewLine },StringSplitOptions.None);
var result = lines.GroupBy(g => g)
.Select(s => new { Key = s.Key, Count = s.Count()})
.ToDictionary(d => d.Key, d => d.Count);
The result would have lines that has single occurrence. If you want only duplicate lines
var result = lines.GroupBy(g => g).Where(x=> x.Count()>1)
.Select(s => new { Key = s.Key, Count = s.Count()})
.ToDictionary(d => d.Key, d => d.Count);
You can then write the CSV directly from the dictionary
File.WriteAllLines(filePath, result.Select(x=>$"{x.Key},{x.Value},"));
Upvotes: 1