Reputation: 25

How do I keep unique lines from two text files, discarding duplicates?

I have 2 files.

For example, the content of file #1 is:

hi1
hi2
hi4

… of file #2 is:

hi1
hi4
hi3
hi5

I would like to sort out these documents so that a third file would contain just:

hi2
hi3
hi5

Can anyone toss me in the right direction? I'm in dire need! Perl is wanted, but C/C++ is accepted.

Upvotes: 0

Answers (4)

tadmc

Reputation: 3744

Count each line, then print out the ones where the count is one:

#!/usr/bin/perl
use warnings;
use strict;

local @ARGV = ('file.1', 'file.2');
my %lines;
while (<>) {
    $lines{$_}++;
}

print sort grep $lines{$_} == 1, keys %lines;

Upvotes: 0

RET

Reputation: 9188

I know you asked for perl or C, but in Unix (or with MKS or equivalent Unix on Windows toolkit):

sort file1 file2 | uniq -u > file3

It doesn't get much simpler than that.

Upvotes: 4

darklion

Reputation: 1055

Here's a quick bit of code to do what you want. There's no error checking, and I'm assuming that your text files are not so huge that you'll run out of memory by loading all the text into a hash array.

open(FILE1, "< file1.txt");
open(FILE2, "< file2.txt");

@file1 = <FILE1>;
@file2 = <FILE2>;

foreach $line (@file1, @file2)
{
    chomp($line);
    $TEXT{$line}++;
}

foreach $line (sort keys %TEXT)
{
    if ($TEXT{$line} == 1)
    {
         print $line . "\n";
    }
}

Upvotes: 2

dmaestro12

Reputation: 903

Still not sure you are describing the problem completely. hi3 is not duplicated, but hi4 is. So should the output contain hi3 instead of hi4? Hint: to detect duplicates in perl, you probably want to use a hash.

Upvotes: -1

How do I keep unique lines from two text files, discarding duplicates?

Answers (4)

Related Questions