Moehre
Moehre

Reputation: 151

how to compare 2 csv-files in perl

I have two csv-files (A, B).

File A

ADDRESSLINKID;NAMEINDEX;ADDRESSLINKINDEX;ADMINREGIONID;TOWNREGIONID;SIDE
1;19;;;1;0
2;21;;;2;0
3;23;;;3;0

File B

ID;DISPLAYTYPE;URBAN;LINKLENGTH;PARENTID;SOURCEID;TRUCKTOLL
1;19;;;;1;0
2;21;;;;2;0
3;23;;;;3;0

Now I will read out the field "SourceID" in File B where File A field "adresslinkid" is existing (read sourceid where adresslinkid eq sourceid)!

sub check_segments {

my $fileA = $dirA."\\lux.adl";
my $fileB = $dirA."\\lux.lin";

open my $fh, '<', $fileA or die "Could not open '$fileA' $!\n";
...

Can I realize this with the grep operator???

The Output should be: result = (1,2,3,4...)

Upvotes: 0

Views: 236

Answers (2)

ceving
ceving

Reputation: 23856

Perl is able to execute SQL statements on CSV files via the DBI using DBD::CSV.

The following example requires, that you input files are named file_a.csv and file_b.csv. The files have to be in the current directory. Change f_dir, if your files are in a different location.

#! /usr/bin/perl
use strict;
use warnings;
use DBI;

my $dbh = DBI->connect ('dbi:CSV:', '', '',
                        { f_dir => '.',
                          f_ext => ".csv/r",
                          csv_sep_char => ';' })
    || die "$DBI::errstr()";

$dbh->{RaiseError} = 1;

my $result = $dbh->selectall_arrayref ('
select sourceid
from file_a, file_b
where file_a.addresslinkid = file_b.sourceid
');

print 'result = (', join (',', map { @$_ } @$result), ")\n";

On Ubuntu this requires the following packages libdbd-csv-perl, libsql-statement-perl.

Upvotes: 1

Borodin
Borodin

Reputation: 126722

I'm not very impressed with your question, or your attempt at a solution, which goes only as far as opening a file

Nevertheless, here's a working program that does what I think you want

use strict;
use warnings 'all';

use List::Util 'first';

my $file_a = 'fileA.txt';
my $file_b = 'fileB.txt';

my @link_ids   = fetch_file_column($file_a, 'ADDRESSLINKID');
my @source_ids = fetch_file_column($file_b, 'SOURCEID');

my %link_ids = map { $_ => 1} @link_ids;

my @result = grep { $link_ids{$_} } @source_ids;
printf "result = (%s)\n", join ',', @result;


sub fetch_file_column {
    my ($file, $column) = @_;

    open my $fh, '<', $file or die qq{Unable to open "$file" for input: $!};

    my @headers;
    for ( scalar <$fh> ) {
        chomp;
        @headers = split /;/;
    }

    my $idx = first { $headers[$_] eq $column } 0 .. $#headers;
    die qq{Header "$column" not found in "$file"} unless defined $idx;

    map { chomp; ( split /;/ )[$idx];  } <$fh>
}

output

result = (1,2,3)

Upvotes: 1

Related Questions