Sam Berchmans
Sam Berchmans

Reputation: 127

Comparision of 2 Files using Perl

Currently i have a requirement for comparing 2 tables each table having around 700M + records.

We have come up with a idea to compare in the File based rather than taking it to the DB , based upon our DB performance. Also heard that Perl is much faster than any other coding method

We need to compare for a Mobile number the Usage plan in one File to the same mobile Number in another File, either if present and its usage is matching or not , we need to write in a new file only the Not matching records.

Example:

File 1

number, Usage type , Usage Plan , Usage Volume (KB)  ........
12344 , CP         , FB         , 100  ........
12323 , UP         , FB         , 200  ........
12322 , CP         , G+         , 300  ........

File 2

number, Usage type , Usage Plan , Usage Volume (KB)  ........
12344 , CP         , FB         , 100  ........
12323 , UP         , FB         , 210  ........

So in the above case my unmatch File should contain

Unmatch File

12323 , UP         , FB         , 210  ........
12322 , CP         , G+         , 300  ........

........ Means there are many columns after this, which we will not be using to compare. They are rather like more details about the plan.

Please share your suggestions and coding idea on this.

Our aim is to complete the comparison within 6 - 7 hours.. so that loading and other things can be completed within 2 days..

Thanks in Advance.. Sam

Upvotes: 1

Views: 86

Answers (2)

user3357237
user3357237

Reputation: 11

Here is another way that does not read the entire files into memory, which may be problem due to memory constraint. For instance, 700M record x 30 byte/rec = 21GB files.

It does requires the file to be sorted by the number when exported from DB. Assuming the number are increasing.

open FILE1, "file1";
open FILE2, "file2";
open OUT, ">out.txt";

$line1 = <FILE1>;
$line2 = <FILE2>;

sub number_part {
    ($line) = @_;
    return $1 if $line =~ /^(\d{1..9})/;
}

while (1) {
    if ($line1 eq $line2) {
        $line1 = <FILE1>;
        $line2 = <FILE2>;

    } elsif ( number_part(line1) == number_part(line2) ) {
        print OUT $line1;
        print OUT $line2;
        $line1 = <FILE1>;
        $line2 = <FILE2>;

    } elsif ( number_part($line1) < number_part($line2) ) {
        print OUT $line1;
        $line1 = <FILE1>;

    } elsif ( number_part(line1) > number_part(line2) ) {
        print OUT $line2;
        $line2 = <FILE2>;
    }

    # Use a dummy record if EOF is reached for either file.
    # Done when EOF is reached for both files.
    $line1 = "9999999999" unless $line1;
    $line2 = "9999999999" unless $line2
    last if $line eq "999999999" and $line2 eq "9999999999";
}

close(FILE1); 
close(FILE2); 
close(OUT);

Upvotes: 1

Aldo
Aldo

Reputation: 133

How about this:

use strict;

open FILE1, 'file1.txt';
open FILE2, 'file2.txt';
open OUTPUT, '>output.txt';

my $regex = qr/^ *(\d+) , (.*) , (.*) , (\d+)/;

my $file1;
while(<FILE1>){
  if(/$regex/){
    $file1->{$1}->{type} = $2;
    $file1->{$1}->{plan} = $3;
    $file1->{$1}->{volume} = $4;
  }
}

my $file2;
while(<FILE2>){
  if(/$regex/){
    $file2->{$1}->{type} = $2;
    $file2->{$1}->{plan} = $3;
    $file2->{$1}->{volume} = $4;
  }
}

my $numbers;
$numbers->{$_} = 1 foreach keys %$file1;
$numbers->{$_} = 1 foreach keys %$file2;

my $output;
foreach(keys %$numbers){
  if(defined $file1->{$_} && defined $file2->{$_}){
    if($file1->{$_}->{type} ne $file2->{$_}->{type} || $file1->{$_}->{plan} ne $file2->{$_}->{plan} || $file1->{$_}->{volume} ne $file2->{$_}->{volume}){
      push @$output, [$_, $file2->{$_}->{type}, $file2->{$_}->{plan}, $file2->{$_}->{volume}];
    }
  }elsif(defined $file1->{$_}){
    push @$output, [$_, $file1->{$_}->{type}, $file1->{$_}->{plan}, $file1->{$_}->{volume}];
  }else{
    push @$output, [$_, $file2->{$_}->{type}, $file2->{$_}->{plan}, $file2->{$_}->{volume}];
  }
}

print OUTPUT join(' , ', @$_)."\n" foreach @$output;

Upvotes: 0

Related Questions