Reputation: 2497

Pattern Matching in Columns

File 1

A11;F1;BMW
A23;F2;BMW
B12;F3;BMW
H11;F4;JBW

File 2

P01;A1;0;0--00  ;123;456;150
P01;A11;0;0--00  ;123;444;208
P01;B12;0;0--00  ;123;111;36
P01;V11;0;0--00  ;123;787;33.9

Output

-;-;-;P01;A1;0;0--00  ;123;456;150
A11;F1;BMW;P01;A11;0;0--00  ;123;444;208
B12;F3;BMW;P01;B12;0;0--00  ;123;111;36
-;-;-;P01;V11;0;0--00  ;123;787;33.9

I TRIED

awk 'FNR==NR {a[$2] = $0; next }{ if($1 in a) {p=$1;$1="";print a[p],$0}}' File1 File2

But didnt work.

Basically I want to get the details from FILE 1 and compare with FILE2 (master list) .

Example :

A1 in FILE2 was not available in FILE1 , so in output file we have “-“ for 1st three fields and rest from FILE2 . Now, we have A11 and we got the detail in FILE1. So we write details of A11 from both File 1 & 2

Upvotes: 1

Answers (5)

Borodin

Reputation: 126772

This can't be done conveniently in a one-line program as it involves reading two input files, however the problem isn't difficult

This program reads all the lines from file1, and uses the first field as a key to store the line in a hash

Then all the lines from file2 are read and the second field used as a key to access the hash. The // defined-or operator is used to print either element's value is printed if it exists, or the default string if not

Finally the current line from file2 is printed

use strict;
use warnings;

my %data;

open my $fh, '<', 'file1' or die $!;
while (<$fh>) {
  chomp;
  my $key = (split /;/)[0];
  $data{$key} = $_;
}

open $fh, '<', 'file2' or die $!;
while (<$fh>) {
  my $key = (split /;/)[1];
  print $data{$key} // '-;-;-;', $_;
}

output

-;-;-;P01;A1;0;0--00  ;123;456;150
A11;F1;BMWP01;A11;0;0--00  ;123;444;208
B12;F3;BMWP01;B12;0;0--00  ;123;111;36
-;-;-;P01;V11;0;0--00  ;123;787;33.9

Upvotes: 1

TLP

Reputation: 67930

A perl solution might include the very nice module Text::CSV. If so, you might extract the values into a hash, and later use that hash for lookup. When looking up values, you would insert blank values -;-;-; for any undefined values in the lookup hash.

use strict;
use warnings;
use Text::CSV;

my $lookup = "file1.csv";   # whatever file is used to look up fields 0-2
my $master = "file2.csv";   # the file controlling the printing

my $csv = Text::CSV->new({
        sep_char    => ";", 
        eol         => $/,  # to add newline to $csv->print()
        quote_space => 0,   # to avoid adding quotes 
    });

my %lookup;

open my $fh, "<", $lookup or die $!;

while (my $row = $csv->getline($fh)) {
    $lookup{$row->[0]} = $row;    # add entire row to specific key
}
open $fh, "<", $master or die $!; # new $fh needs no close

while (my $row = $csv->getline($fh)) {
    my $extra = $lookup{$row->[1]} // [ qw(- - -) ]; # blank row if undef
    unshift @$row, @$extra;       # add the new values
    $csv->print(*STDOUT, $row);   # then print them
}

Output:

-;-;-;P01;A1;0;0--00  ;123;456;150
A11;F1;BMW;P01;A11;0;0--00  ;123;444;208
B12;F3;BMW;P01;B12;0;0--00  ;123;111;36
-;-;-;P01;V11;0;0--00  ;123;787;33.9

Upvotes: 1

perreal

Reputation: 98118

With Perl:

use warnings;
use strict;
my %file1;
open (my $f1, "<", "file1") or die();
while (<$f1>) {
  chomp;
  my @v = (split(/;/))[0];
  $file1{$v[0]} = $_; 
}
close ($f1);
open (my $f2, "<", "file2") or die();
while (<$f2>) {
  chomp;
  my $v = (split(/;/))[1];
  if (defined $file1{$v}) {
    print "$file1{$v};$_\n";
  } else {
    print "-;-;-;$_\n";
  }
}
close ($f2);

Upvotes: 2

choroba

Reputation: 242383

The usual Perl way: use a hash to remember the master list:

#!/usr/bin/perl
use warnings;
use strict;

my %hash;

open my $MASTER, '<', 'File1' or die $!;
while (<$MASTER>) {
    chomp;
    my @columns = split /;/;
    $hash{$columns[0]} = [@columns[1 .. $#columns]];
}
close $MASTER;

open my $DETAIL, '<', 'File2' or die $!;
while (<$DETAIL>) {
    my @columns = split /;/;
    if (exists $hash{$columns[1]}) {
        print join ';', $columns[1], @{ $hash{$columns[1]} }, q();
    } else {
        print '-;-;-;';
    }
    print;
}
close $DETAIL;

Upvotes: 3

ruakh

Reputation: 183602

I would do this in Perl, personally, but since everyone and their mother is giving you a Perl solution, here's an alternative:

Provided that the records in each file have a consistent number of fields, and provided that the records in each file are sorted by the "join" field in lexicographic order, you can use join:

join -1 1 -2 2 -t ';' -e - -o '1.1 1.2 1.3 2.1 2.2 2.3 2.4 2.5 2.6 2.7' -a 2 File1 File2

Explanation of options:

-1 1 and -2 2 mean that the "join" field (A11, A23, etc.) is the first field in File1 and the second field in File2.
-t ';' means that fields are separated by ;
-e - means that empty fields should be replaced by -
-o '1.1 1.2 1.3 2.1 2.2 2.3 2.4 2.5 2.6 2.7' means that you want each output-line to consist of the first three fields from File1, followed by the first seven fields from File2. (This is why this approach requires that the records in each file have a consistent number of fields.)
-a 2 means that you want to include every line from File2 in the output, even if there's no corresponding line from File1. (Otherwise it would only output lines that have a match in both files.)

Upvotes: 5

Pattern Matching in Columns

Answers (5)

Related Questions