Novice
Novice

Reputation: 169

How to merge files based on a common column?

There are 2 files, file1 is the output of a telnet command i.e.

25-08-2019 : Port port1 of URL http://ip1:port1/ is [ NOT OPEN ]  
25-08-2019 : Port port2 of URL http://ip2:port2/ is [ NOT OPEN ] 

and another is the file2, which is like

http://ip1:port1/, ZOOM1  
http://ip2:port2/, ZOOM2  
http://ip3:port3/, ZOOM3

I need to merge these 2 files based on common IP and port. The output should be a 3rd file like:

25-08-2019 : Port port1 of URL http://ip1:port1/ is [ NOT OPEN ]  ZOOM1  
25-08-2019 : Port port2 of URL http://ip2:port2/ is [ NOT OPEN ]  ZOOM2

I tried join, but join gives errors in my shell. Any help without join would be highly helpful.

I tried join, this works as a command line, but fails in a shell script, both in bash and sh. Moreover it does not match, it just copy-pastes.

paste -d " : " file1 <(cut -s -d ',' -f2 file2)

I also tried the awk command, but it does not process files as expected.

awk 'NR==FNR {h[$2] = $3; next} {print $1,$2,$3,h[$2]}' file2 file1 > file3

Upvotes: 3

Views: 663

Answers (5)

tshiono
tshiono

Reputation: 22087

Please try the following:

awk 'NR==FNR {h[$1]=$2; next} {print $0" "h[$7]}' <(sed "s/,//" file2) file1

Result:

25-08-2019 : Port port1 of URL http://ip1:port1/ is [ NOT OPEN ] ZOOM1
25-08-2019 : Port port2 of URL http://ip2:port2/ is [ NOT OPEN ] ZOOM2

Upvotes: 1

Kjetil S.
Kjetil S.

Reputation: 3795

#!/usr/bin/perl
my %z = split/[, \n]+/, qx(cat file2);  # read file2 into %z for lookups
my @file1 = split/\n/, qx(cat file1);   # read lines of file1 into @file1
for( @file1 ){                          # for each line of file1
  /http\S+/;                     # find the url, \S+ is non-space chars
  print "$_ $z{$&}\n";           # url in $& print the line and "the zoom" from %z
}

Replace file1 and file2 with $ARGV[0] and $ARGV[1] if you want to get the filenames from the command line. I don't know if /usr/bin/parse and awk could work in this situation like you suggest. Would be interesting to see how. Perl is superior to awk in most cases.

Upvotes: 2

xiawi
xiawi

Reputation: 1822

It can be done with awk but it would be easier with the same separator in both file. Thus first remove the comma in file2:

sed -i.old 's/,//' file2

The you can process with:

awk '{

    if(FILENAME=="file1"){
        m[$7]=$0
    }
    else {
        if(m[$1]!=""){
           print m[$1],$2
        }
    }
}' file1 file2

it first registers the content of file1 into a map with a key on the http://... and a value that contains the full line ($0). Then it processes file2 and displays what is expected if the second column of file2 corresponds to a key of the map.

In your particular case, you can do all in one line with:

awk -F'[ ,]' '{

    if(FILENAME=="file1"){
        m[$7]=$0
    }
    else {
        if(m[$1]!=""){
           print m[$1],$2
        }
    }
}' file1 file2

Both space and comma are considered as separator by awk

Upvotes: 1

Shawn
Shawn

Reputation: 52644

Using join is a bit complicated because the two files have different delimiters, but:

$ join -17 -21 -o 1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,1.10,1.11,1.12,2.2 <(sort -k7,7 a.txt) <(sort -k1,1 -t, b.txt | tr -d ',')
25-08-2019 : Port port1 of URL http://ip1:port1/ is [ NOT OPEN ] ZOOM1
25-08-2019 : Port port2 of URL http://ip2:port2/ is [ NOT OPEN ] ZOOM2

If the files are already sorted based on the URL, the sort bits can be removed, though you still need to strip the commas from the second file.

Upvotes: 3

Dave Cross
Dave Cross

Reputation: 69314

Read file2 into a hash and then process file1 a line at a time, extracting the key and looking it up in your hash. Something like this:

#!/usr/bin/perl

use strict;
use warnings;

open my $fh2, '<', 'file2' or die $!;

my %data_hash = map { split /,/ } <$fh2>;

close $fh2;

open my $fh1, '<', 'file1' or die $!;

while (<$fh1>) {
  if (my ($key) = /\b(http:\S+)/) {
    if (exists $data_hash{$key}) {
      chomp;
      print "$_ $data_hash{$key}";
    } else {
      # Key doesn't exist in file2
      print;
    }
  } else {
    # No http key found on a line in file1
    print;
  }
}

Upvotes: 1

Related Questions