Thierry
Thierry

Reputation: 6468

Find duplicates within multiple lists using linq

I've got 4 classes each inheriting from the same base class. They all have an Id & Path property in common. When my data is loaded from xml, I build 4 lists based on these classes.

I've got the following code which allows me to compare 2 lists:

var list1 = settings.EmailFolders.Select(m => m.Path).ToList().
Intersect(settings.LogPaths.Select(m=>m.Path).ToList()).ToList();

but I'd like

  1. I want to compare the 1st list to the other 3 and find if there are duplicate path.

  2. If none are found, I then want to apply the same logic comparing the second list and compare it to the other 2.

  3. If none are found, I then want to apply the same logic comparing the third list and compare it to the last one.

Please note that I don't want point 1, 2 and 3 in a single process as I'd like to report the relevant error if duplicates are found.

If I re-use the above and add an additional Intersect with an additional list,

var list1 = settings.EmailFolders.Select(m => m.Path).ToList().
Intersect(settings.LogPaths.Select(m=>m.Path).ToList()).
Intersect(settings.SuccessPaths.Select(m=>m.Path).ToList()).
Intersect(settings.FailurePaths.Select(m=>m.Path).ToList()).List();

The above would like applying an and operator between each rather than an or operator which is not what I want. I need to know if any of the Path used in FailurePaths or LogPaths or SuccessPaths are already used in my EmailFolder.Path

UPDATE:

I've updated my question with an answer based on @Devuxer suggestion which is the correct answer but instead of working with string array, I'm basing it on list of classes which all contains a Path property.

List<EmailFolder> emailFolders = LoadEmailFolders();
List<LogPath> logPaths = LoadLogPaths();
List<SuccessPath> successPaths = LoadSuccessPaths();
List<FailurePath> failurePaths = LoadFailurePaths();

var pathCollisions = EmailFolders.Select(a=>a.Path)
.Intersect(LogPaths.Select(b=>b.Path))
.Select(x => new { Type = "LogPath", Path = x })
.Concat(EmailFolders.Select(c=>c.Path)
.Intersect(SuccessPaths.Select(d=>d.Path))
.Select(x => new { Type = "SuccessPath", Path = x }))
.Concat(EmailFolders.Select(e=>e.Path)
.Intersect(FailurePaths.Select(f=>f.Path))
.Select(x => new { Type = "FailurePath", Path = x })).ToList();

Note that in the above example, for simplicity sake, I've just declared my list with a function for each which loading the relevant data but in my case, all the data is deserialized and loaded at once.

Upvotes: 1

Views: 4983

Answers (2)

devuxer
devuxer

Reputation: 42384

I may be slightly misunderstanding your requirements, but it looks like you want something like this:

var emailFolders = new[] { "a", "b", "c" };
var logPaths = new[] { "c", "d", "e" };
var successPaths = new[] { "f", "g", "h" };
var failurePaths = new[] { "a", "c", "h" };

var pathCollisions = emailFolders
    .Intersect(logPaths)
    .Select(x => new { Type = "Email-Log", Path = x })
    .Concat(emailFolders
        .Intersect(successPaths)
        .Select(x => new { Type = "Email-Success", Path = x }))
    .Concat(emailFolders
        .Intersect(failurePaths)
        .Select(x => new { Type = "Email-Failure", Path = x }));

This results in:

Type          | Path
--------------------
Email-Log     |  c 
Email-Failure |  a 
Email-Failure |  c
--------------------

If this is what you were looking for, Intersect was definitely the right way to find the duplicates. I just added some Concat clauses and provided a Type so you could see what type of path collision occurred.

Edit

Well the bullets in your question seem to ask for something different from the last sentence of your question.

If you really want all comparisons, one way to do that is to add more Concat clauses:

var pathCollisions = emailFolders
    .Intersect(logPaths)
    .Select(x => new { Type = "Email-Log", Path = x })
    .Concat(emailFolders
        .Intersect(successPaths)
        .Select(x => new { Type = "Email-Success", Path = x }))
    .Concat(emailFolders
        .Intersect(failurePaths)
        .Select(x => new { Type = "Email-Failure", Path = x }))
    .Concat(logPaths
        .Intersect(successPaths)
        .Select(x => new { Type = "Log-Success", Path = x }))
    .Concat(logPaths
        .Intersect(failurePaths)
        .Select(x => new { Type = "Log-Failure", Path = x }))
    .Concat(successPaths
        .Intersect(failurePaths)
        .Select(x => new { Type = "Success-Failure", Path = x }));

This doesn't "short-circuit" the way you intended, though (so it will find all the path collisions rather than stop after the first type), but it should at least give you the data you need to generate a report.

Upvotes: 1

Ali Adravi
Ali Adravi

Reputation: 22823

You can use the Contains by using the property to compare

 var result = list1.Where(x => list2.Contains(x.Path));

It will give all the items from list1 which are also in list2

If you want only the count then use .Count() at the end

Same rule apply to compare other 3 lists

Upvotes: 1

Related Questions