Arivazhagan
Arivazhagan

Reputation: 101

Join two files including unmatched lines in Shell

File1.log

207.46.13.90  37556
157.55.39.51  34268
40.77.167.109 21824
157.55.39.253 19683

File2.log

207.46.13.90  62343
157.55.39.51  58451
157.55.39.200 37675
40.77.167.109 21824

Below should be expected Output.log

207.46.13.90    37556   62343
157.55.39.51    34268   58451
157.55.39.200   -----   37675
40.77.167.109   21824   21824
157.55.39.253   19683   -----

I tried with the below 'join' command - but it skips the missing line

join --nocheck-order File1.log File2.log

outputting like below (not as expected)

207.46.13.90  37556 62343
157.55.39.51  34268 58451
40.77.167.109 21824 21824

Could someone please help with the proper command for the desired output. Thanks in advance

Upvotes: 4

Views: 1217

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133760

Could you please try following.

awk '
FNR==NR{
  a[$1]=$2
  next
}
($1 in a){
  print $0,a[$1]
  b[$1]
  next
}
{
  print $1,$2 " ----- "
}
END{
  for(i in a){
    if(!(i in b)){
      print i" ----- "a[i]
    }
  }
}
'  Input_file2  Input_file1

Output will be as follows.

207.46.13.90  37556 62343
157.55.39.51  34268 58451
40.77.167.109 21824 21824
157.55.39.253 19683 -----
157.55.39.200 ----- 37675

Upvotes: 3

KamilCuk
KamilCuk

Reputation: 141890

The following is just enough if you don't care about sorting order of the output:

join -a1 -a2 -e----- -oauto <(sort file1.log) <(sort file2.log) |
column -t -s' ' -o'   '

with recreation of the input files:

cat <<EOF >file1.log
207.46.13.90  37556
157.55.39.51  34268
40.77.167.109 21824
157.55.39.253 19683
EOF
cat <<EOF >file2.log
207.46.13.90  62343
157.55.39.51  58451
157.55.39.200 37675
40.77.167.109 21824
EOF

outputs:

157.55.39.200   -----   37675
157.55.39.253   19683   -----
157.55.39.51    34268   58451
207.46.13.90    37556   62343
40.77.167.109   21824   21824

join by default joins by the first columns. The -a1 -a2 make it print the unmatched lines from both inputs. The -e----- prints unknown columns as dots. The -oauto determinates the output from the columns of the inputs. Because we want to sort on the first column, we don't need to specif -k1 to sort, but sort -s -k1 could speed things up. To match the expected output, I also piped to column.

You can sort the output by ports by pipeing it to for example to sort -rnk2,3.

Upvotes: 1

Related Questions