JadonR
JadonR

Reputation: 193

Use Powershell to compare two text files and remove lines with duplicate

I have two text files that contain many duplicate lines. I would like to run a powershell statement that will output a new file with only the values NOT already in the first file. Below is an example of two files.

File1.txt
-----------
Alpha
Bravo
Charlie


File2.txt
-----------
Alpha
Echo
Foxtrot

In this case, only Echo and Foxtrot are not in the first file. So these would be the desired results.

OutputFile.txt
------------
Echo
Foxtrot

I reviewed the below link which is similar to what I want, but this does not write the results to an output file.

Remove lines from file1 that exist in file2 in Powershell

Upvotes: 2

Views: 4980

Answers (2)

nabrond
nabrond

Reputation: 1388

Using the approach in the referenced link will work however, for every line in the original file, it will trigger the second file to be read from disk. This could be painful depending on the size of your files. I think the following approach would meet your needs.

$file1 = Get-Content .\File1.txt
$file2 = Get-Content .\File2.txt

$compareParams = @{
    ReferenceObject = $file1
    DifferenceObject = $file2
}

Compare-Object @compareParams | 
    Where-Object -Property SideIndicator -eq '=>' |
    Select-Object -ExpandProperty InputObject |
    Out-File -FilePath .\OutputFile.txt

This code does the following:

  1. Reads each file into a separate variable
  2. Creates a hashtable for the parameters of Compare-Object (see about_Splatting for more information)
  3. Compares the two files in memory and passes the results to Out-File
  4. Writes the contents of the pipeline to "OutputFile.txt"

If you are comfortable with the overall flow of this, and are only using this in one-off situations, the whole thing can be compressed into a one-liner.

(Compare-Object (gc .\File1.txt) (gc .\File2.txt) | ? SideIndicator -eq '=>').InputObject | Out-File .\OutputFile.txt

Upvotes: 2

Glenn
Glenn

Reputation: 1855

Here's one way to do it:

# Get unique values from first file
$uniqueFile1 = (Get-Content -Path .\File1.txt) | Sort-Object -Unique

# Get lines in second file that aren't in first and save to a file
Get-Content -Path .\File2.txt | Where-Object { $uniqueFile1 -notcontains $_ } | Out-File .\OutputFile.txt

Upvotes: 3

Related Questions