Reputation: 6959
I have a very simple text file parsing app which searches for an email address and if found adds to a list.
Currently there are duplicate email addresses in the list and I'm looking for a quick way of trimming the list down to only contain distinct values - without iterating over them one by one :)
Here's code -
var emailLines = new List<string>();
using (var stream = new StreamReader(@"C:\textFileName.txt"))
{
while (!stream.EndOfStream)
{
var currentLine = stream.ReadLine();
if (!string.IsNullOrEmpty(currentLine) && currentLine.StartsWith("Email: "))
{
emailLines.Add(currentLine);
}
}
}
Upvotes: 1
Views: 2185
Reputation: 754565
Try the following
var emailLines = File.ReadAllLines(@"c:\textFileName.txt")
.Where(x => !String.IsNullOrEmpty(x) && x.StartsWith("Email: "))
.Distinct()
.ToList();
The downside to this approach is that it reads all of the lines in the file into a string[]
. This happens immediately and for large files will create a correspondingly large array. It's possible to get back the lazy reading of lines by using a simple iterator.
public static IEnumerable<string> ReadAllLinesLazy(string path) {
using ( var stream = new StreamReader(path) ) {
while (!stream.EndOfStream) {
yield return stream.ReadLine();
}
}
}
The File.ReadAllLines
call above can then just be replaced with a call to this function
Upvotes: 3
Reputation: 2532
IEnumerable/Linq goodness (great for large files - only the matching lines are ever kept in memory):
// using System.Linq;
var emailLines = ReadFileLines(@"C:\textFileName.txt")
.Where(line => currentLine.StartsWith("Email: "))
.Distinct()
.ToList();
public IEnumerable<string> ReadFileLines(string fileName)
{
using (var stream = new StreamReader(fileName))
{
while (!stream.EndOfStream)
{
yield return stream.ReadLine();
}
}
}
Upvotes: 1
Reputation: 85458
If you just need unique items, you could use add your items to a HashSet
instead of a List
. Note that HashSet
s have no implied order. If you need an ordered set, you could use SortedSet
instead.
var emailLines = new HashSet<string>();
Then there'd be no duplicates.
To remove duplicates from a List
, you could use IEnumerable.Distinct()
:
IEnumerable<string> distinctEmails = emailLines.Distinct();
Upvotes: 7