ybc
ybc

Reputation: 67

Using Perl I want to compare two files and how do I keep unique lines from first file, discarding matched and duplicate entries from the second file?

I have two files.

For example, the content of file #1 is:

dynSamp/dgenExp
dynSamp/dgenLod
dynSamp/dgenStm
dynSamp/dgenUpd
dynSamp/dmlnodExp
dynSamp/dmlnodLod
dynSamp/dmlnodStm
dynSamp/dmlnodUpd
dynSamp/dmndynLod
dynSam/dmndynStm
dynSamp/dmndynUpd
sample/genExp
sample/genLod
sample/genStm
sample/genUpd
sample/mlnodExp
sample/mlnodLod
sample/mlnodStm
sample/mlnodUpd
sample/mndynLod
sample/mndynStm
sample/mndynUpd
sample/genLod
dynSamp/dgenLod
dynSamp/dmlnodLod
dynSamp/dmndynLod
sample/mndynLod
sample/mlnodLod

And the content of file #2 is:

dynSamp/dgenExp
dynSamp/dgenLod
dynSamp/dgenStm
dynSamp/dgenUpd
dynSamp/dmlnodStm
dynSamp/dmndynStm
dynSamp/dthrdsUpd_unix
dynSamp/dthrdsUpd_win
sample/genExp
sample/genLod
sample/genStm
sample/genUpd
sample/mlnodStm
sample/mndynStm
sample/thrdsUpd_unix
sample/thrdsUpd_win
sample/genLod
dynSamp/dgenLod
dynSamp/dmndynStm
dynSamp/dthrdsUpd_win

I would like to sort out these two file. The result should be the unique contents of first file minus the unique/duplicate contents of second file.

The following should be all that remains of file #:

dynSamp/dmlnodExp
dynSamp/dmlnodLod
dynSamp/dmlnodUpd
dynSamp/dmndynLod
dynSamp/dmndynUpd
sample/mlnodExp
sample/mlnodLod
sample/mlnodUpd
sample/mndynLod
sample/mndynUpd

Can anyone please help me in sorting out this? Thanks!

Upvotes: 0

Views: 1274

Answers (3)

ikegami
ikegami

Reputation: 385657

You didn't ask any question, so I presume you are having problems coming up with an algorithm. Here's one:

  1. Open the second file.
  2. For each line in the second file,
    1. Create an element in a hash keyed by that line.
  3. Open the first file.
  4. For each line in the first file,
    1. If the hash has no element keyed by that line,
      1. Create an element in a hash keyed by that line.
      2. Print that line.

This algorithm preserves the order of the records of the first file.


Since it's rather trivial to code it, I might as well provide that too.

my %skip;
{
   open(my $fh, '<', $ARGV[1])
      or die("Can't open \"$ARGV[1]\": $!\n");
   while (<$fh>) {
      chomp;
      ++$skip{$_};
   }
}

{
   open(my $fh, '<', $ARGV[0])
      or die("Can't open \"$ARGV[0]\": $!\n");
   while (<$fh>) {
      chomp;
      print "$_\n" if !$skip{$_}++;
   }
}

Usage:

script file1 file2 >file.out

Or sorted:

script file1 file2 | sort >file.out

Upvotes: 3

Vijay
Vijay

Reputation: 67221

its a bit straight forward in awk with sort:

awk 'FNR==NR{a[$0];next}{if(!($0 in a))print $0}' temp2 temp | sort -u

and i think dynSam/dmndynStm, should also be included in your output according to your requirement.

> awk 'FNR==NR{a[$0];next}{if(!($0 in a))print $0}' temp2 temp | sort -u
dynSam/dmndynStm,
dynSamp/dmlnodExp,
dynSamp/dmlnodLod,
dynSamp/dmlnodUpd,
dynSamp/dmndynLod,
dynSamp/dmndynUpd,
sample/mlnodExp,
sample/mlnodLod,
sample/mlnodUpd,
sample/mndynLod,
sample/mndynUpd,
>

Upvotes: 0

Red Cricket
Red Cricket

Reputation: 10470

I think you want something like this ...

dogface@computer ~
$ cat sortit.pl
#!/usr/bin/perl -w
use strict;


my $file1 = 'file1';
my $file2 = 'file2';

my %bad;
my %good;

open BAD, "<$file2";
while (<BAD>) {
        chomp;
        $bad{$_} = 1;
}
close BAD;

open GOOD, "<file1";
while( <GOOD> ) {
        chomp;
        next if $bad{$_};
        $good{$_} = 1;
}
close GOOD;

open OUT, ">file3";
foreach my $key ( keys %good ) {
        print OUT $key . "\n";
}
close OUT;

dogface@computer ~
$ cat file1
dynSamp/dgenExp
dynSamp/dgenLod
dynSamp/dgenStm
dynSamp/dgenUpd
dynSamp/dmlnodExp
dynSamp/dmlnodLod
dynSamp/dmlnodStm
dynSamp/dmlnodUpd
dynSamp/dmndynLod
dynSam/dmndynStm
dynSamp/dmndynUpd
sample/genExp
sample/genLod
sample/genStm
sample/genUpd
sample/mlnodExp
sample/mlnodLod
sample/mlnodStm
sample/mlnodUpd
sample/mndynLod
sample/mndynStm
sample/mndynUpd
sample/genLod
dynSamp/dgenLod
dynSamp/dmlnodLod
dynSamp/dmndynLod
sample/mndynLod
sample/mlnodLod

dogface@computer ~
$ cat file2
dynSamp/dgenExp
dynSamp/dgenLod
dynSamp/dgenStm
dynSamp/dgenUpd
dynSamp/dmlnodStm
dynSamp/dmndynStm
dynSamp/dthrdsUpd_unix
dynSamp/dthrdsUpd_win
sample/genExp
sample/genLod
sample/genStm
sample/genUpd
sample/mlnodStm
sample/mndynStm
sample/thrdsUpd_unix
sample/thrdsUpd_win
sample/genLod
dynSamp/dgenLod
dynSamp/dmndynStm
dynSamp/dthrdsUpd_win

dogface@computer ~
$ ./sortit.pl

dogface@computer ~
$ cat file3
sample/mndynLod
dynSamp/dmlnodUpd
dynSamp/dmlnodLod
dynSamp/dmlnodExp
sample/mndynUpd
sample/mlnodUpd
sample/mlnodLod
dynSamp/dmndynLod
dynSamp/dmndynUpd
sample/mlnodExp
dynSam/dmndynStm

dogface@computer ~
$

Oh if you want file3 sorted, use the following instead:

foreach my $key ( sort keys %good ) {
        print OUT $key . "\n";
}

Upvotes: 0

Related Questions