Zian Choy
Zian Choy

Reputation: 2894

Why doesn't the Union function in LINQ remove duplicate entries?

I'm using VB .NET and I know that Union normally works ByRef but in VB, Strings are generally processed as if they were primitive datatypes.

Consequently, here's the problem:

Sub Main()
    Dim firstFile, secondFile As String(), resultingFile As New StringBuilder

    firstFile = My.Computer.FileSystem.ReadAllText(My.Computer.FileSystem.SpecialDirectories.Desktop & "\1.txt").Split(vbNewLine)
    secondFile = My.Computer.FileSystem.ReadAllText(My.Computer.FileSystem.SpecialDirectories.Desktop & "\2.txt").Split(vbNewLine)

    For Each line As String In firstFile.Union(secondFile)
        resultingFile.AppendLine(line)
    Next

    My.Computer.FileSystem.WriteAllText(My.Computer.FileSystem.SpecialDirectories.Desktop & "\merged.txt", resultingFile.ToString, True)
End Sub

1.txt contains:
a
b
c
d
e

2.txt contains:
b
c
d
e
f
g
h
i
j

After running the code, I get:
a
b
c
d
e
b
f
g
h
i
j

Any suggestions for making the Union function act like its mathematical counterpart?

Upvotes: 6

Views: 10998

Answers (2)

Kelsey
Kelsey

Reputation: 47726

I think you want to use the Distinct function. At then end of your LINQ statement do .Distinct();

var distinctList = yourCombinedList.Distinct();

Similar to a 'SELECT DISTINCT' in SQL :)

Upvotes: 2

Robert Paulson
Robert Paulson

Reputation: 18061

Linq Union does perform as you want it to. Ensure your input files are correct (e.g. one of the lines may contain a space before the newline) or Trim() the strings after splitting?

var list1 = new[] { "a", "s", "d" };
var list2 = new[] { "d", "a", "f", "123" };
var union = list1.Union(list2);
union.Dump(); // this is a LinqPad method

In linqpad, the result is {"a", "s", "d", "f", "123" }

Upvotes: 16

Related Questions