timothyclifford
timothyclifford

Reputation: 6959

Getting unique items from a list of strings

I have a very simple text file parsing app which searches for an email address and if found adds to a list.

Currently there are duplicate email addresses in the list and I'm looking for a quick way of trimming the list down to only contain distinct values - without iterating over them one by one :)

Here's code -

var emailLines = new List<string>();
using (var stream = new StreamReader(@"C:\textFileName.txt"))
{
    while (!stream.EndOfStream)
    {
        var currentLine = stream.ReadLine();

        if (!string.IsNullOrEmpty(currentLine) && currentLine.StartsWith("Email: "))
        {
            emailLines.Add(currentLine);
        }
    }
}

Upvotes: 1

Views: 2185

Answers (3)

JaredPar
JaredPar

Reputation: 754565

Try the following

var emailLines = File.ReadAllLines(@"c:\textFileName.txt")
  .Where(x => !String.IsNullOrEmpty(x) && x.StartsWith("Email: "))
  .Distinct()
  .ToList();

The downside to this approach is that it reads all of the lines in the file into a string[]. This happens immediately and for large files will create a correspondingly large array. It's possible to get back the lazy reading of lines by using a simple iterator.

public static IEnumerable<string> ReadAllLinesLazy(string path) { 
  using ( var stream = new StreamReader(path) ) {
    while (!stream.EndOfStream) {
      yield return stream.ReadLine();
    }
  }
}

The File.ReadAllLines call above can then just be replaced with a call to this function

Upvotes: 3

Will
Will

Reputation: 2532

IEnumerable/Linq goodness (great for large files - only the matching lines are ever kept in memory):

// using System.Linq;

var emailLines = ReadFileLines(@"C:\textFileName.txt")
    .Where(line => currentLine.StartsWith("Email: "))
    .Distinct()
    .ToList();

public IEnumerable<string> ReadFileLines(string fileName)
{
    using (var stream = new StreamReader(fileName))
    {
        while (!stream.EndOfStream)
        {
            yield return stream.ReadLine();
        }
    }
}

Upvotes: 1

NullUserException
NullUserException

Reputation: 85458

If you just need unique items, you could use add your items to a HashSet instead of a List. Note that HashSets have no implied order. If you need an ordered set, you could use SortedSet instead.

var emailLines = new HashSet<string>();

Then there'd be no duplicates.


To remove duplicates from a List, you could use IEnumerable.Distinct():

IEnumerable<string> distinctEmails = emailLines.Distinct();

Upvotes: 7

Related Questions