Philip Zx
Philip Zx

Reputation: 15

How to remove duplicate words or lines by comparing two text files using batch?

I'm running a program to remove the duplicate lines by comparing two text file using batch.

This is for my personal use to make my work faster by removing duplicate lines from two text files.

I am using the below code,

copy textfile1.txt output.txt >nul
findstr /lvxig:textfile1.txt textfile2.txt >>output.txt

textfile1.txt contains,

apple
orange
mango

textfile2.txt contains,

apple
mango
grapes

I expect the output in output.txt is,

orange
grapes

But the output am getting in output.txt is

apple
orange
mango
grapes

I don't want to merge two text files. I want to remove the similar duplicate by comparing those two text files.

Upvotes: 2

Views: 1880

Answers (3)

acgbox
acgbox

Reputation: 334

Try this:

cat textfile1.txt textfile2.txt | grep -Fvxf <(comm -12 <(sort -u textfile1.txt) <(sort -u textfile2.txt))

explanation of this code:

cat: read data from files

comm -12 <(sort -u textfile1.txt) <(sort -u textfile2.txt): shows only duplicate lines in the two files

grep -Fvxf: remove duplicate lines resulting from comm 12

So:

textfile1.txt:

apple
orange
mango

textfile2.txt:

apple
mango
grapes

out:

orange
grapes

as the user who asked the question wants it.

Upvotes: 0

aschipfl
aschipfl

Reputation: 34989

What about this approach:

findstr /LVXIG:"textfile2.txt" "textfile1.txt" > "output.txt"
findstr /LVXIG:"textfile1.txt" "textfile2.txt" >>"output.txt"

Or with common redirection:

(
    findstr /LVXIG:"textfile2.txt" "textfile1.txt"
    findstr /LVXIG:"textfile1.txt" "textfile2.txt"
) > "output.txt"

Using your example data, the first findstr command line returns:

orange

And the second one outputs:

grapes

Upvotes: 2

lit
lit

Reputation: 16266

How about creating a hash and counting the occurrences? Then, only use those that have one (1) occurrence. This would avoid reading both files twice.

=== undupe.ps1

$hash = @{}
Get-Content 'testfile1.txt', 'testfile2.txt' | ForEach-Object { $hash[$_]++ }
foreach ($key in $hash.Keys) { if ($hash[$key] -eq 1) { Write-Output $key } }

Run it from a cmd shell or .bat file script.

powershell -NoLogo -NoProfile -File "undupe.ps1" >output.txt

Upvotes: 0

Related Questions