hariette
hariette

Reputation: 33

Comparing two hashsets

I have two hashsets that are loading data from two different text files. The contents of both text files look as follows:

name/12441431252132
name1/323244231244142
name2/32423452524234

My code to currently load both files and make sure i only have unique results from textFile2:

HashSet<string> txt1 = new HashSet<string>(File.ReadLines("textFile1.txt"));
HashSet<string> txt2 = new HashSet<string>(File.ReadLines("textFile2.txt"));

txt2.ExceptWith(txt1); 

My problem is that it only removes lines based on the criteria if the whole line matches. I want to remove It based on the name basis instead. For example, name2 should never be included if its in textFile1 even if the id's after the / are different.

How would i accomplish this?

Let me know if my explanation is not good i will try to improve it - and please excuse my english!

Upvotes: 2

Views: 595

Answers (3)

Tim Schmelter
Tim Schmelter

Reputation: 460158

Are you sure that a HashSet is still the best choice? Here is a different approach using a Dictionary<String, String>:

var lines1 = System.IO.File.ReadLines(path1);
var lines2 = System.IO.File.ReadLines(path2);
var allItems = new Dictionary<String, String>();
foreach (var line in lines1.Concat(lines2))
{
    String[] tokens = line.Split('/');
    if (tokens.Length == 2)
    {
        String name = tokens[0];
        String number = tokens[1];
        if (!allItems.ContainsKey(name))
            allItems.Add(name, number);
    }
}

Upvotes: 0

spender
spender

Reputation: 120480

if you split by /, you can build a HashSet of names that appear in the first set, then pick the items in the second set that have a name that does not appear in the first set.

var nameValues1=
    File
     .ReadLines(fileName)
     .Select(line=>line.Split('/'))
     .Select(parts=>new {name=parts[0],value=parts[1]});
var nameValues2=
    File
     .ReadLines(fileName2)
     .Select(line=>line.Split('/'))
     .Select(parts=>new {name=parts[0],value=parts[1]});
var names1=new HashSet<string>(nameValues1.Select(nv=>nv.name);
var result=
    nameValues2
     .Where(nv=>!names1.Contains(nv.name))
     .Select(nv=>string.Format("{0}/{1}",nv.name,nv.value);

Upvotes: 0

BrokenGlass
BrokenGlass

Reputation: 160922

You can just add some string splitting to separate names and the rest of the content - the approach is a little "dirty" so in real code I would probably use foreach loops and introduce dedicated classes:

var content = File.ReadLines("textFile1.txt").Select(line => 
{
    var parts = line.Split('/');
    return new 
    { 
        Name = parts[0],
        Content = parts[1]
    };
});

HashSet<string> names = new HashSet<string>(content.Select(c=> c.Name));
HashSet<string> txt2 = new HashSet<string>(File.ReadLines("textFile2.txt"));
var uniques = txt2.Where(line => !names.Contains(line.Split('/')[0]));

Upvotes: 3

Related Questions