Using shell commands within an awk script which must access the awk commands

Question

This is essentially the command I want, All of it works except that I want to print something special in my third column that would use shell commands(or just more awk commands I guess but I don't know how I would fit this into the original awk statement). All I need help with is the pseudo command substitution between $2, and ar[$4,$1] in the print statement but left the rest in for the sake of specificity.

awk 'NR==FNR{ar[$3,$2]=$1+ar[$3,$2]; }
     NR>FNR && ar[$4,$1] {print "hs"$1,$2,`awk '$1 == #$1 from outer awk command# file2 | tail -n 1 | awk '{print $3}'`, ar[$4,$1]}' file1 file2

file1 will look like

5   8       t11 
15  7       t12 
3   7       t14

file2 will look like

8 4520 5560 t11 
8 5560 6610 t12 
8 6610 7400 t13 
7 9350 10610 t11 
7 10610 11770 t12 
7 11770 14627 t13
7 14627 16789 t14

And output should look like

8 4520 7400 5
7 10610 16789 15
7 14647 16789 3

Thank-You!

agc · Accepted Answer

Non-awk, inefficient shell tools code:

while read a b c ; do \
    echo -n "$b " ; \
    egrep "^$b " file2 | \
      grep -A 9999999 " $c" | \
      cut -d' ' -f2,3 | \
      sed '1{s/ .*//;t}
           ${s/.* //;t};d' | \
      xargs echo -n  ; \
    echo " $a" ; \
done < file1 | \
  column -t

Output:

8  4520  7400   5
7  10610 16789  15

The main loop inputs file1 which controls what in file2 needs to be printed. file1 has 3 fields, so read needs 3 variables: $a, $b, and $c. The output uses $b and $a, so those two variables come "for free" -- the first and last lines of the main loop, (both echos), prefix $b and suffix $a to the two numbers in the middle of each line.

The egrep prints every line in file2 that begins with $b, but of those lines we only want the one that ends in $c plus the lines after that, which is what grep -A ... prints. Only the middle two columns are needed, so cut prints just those columns. Now we have a two column block of numbers, and we only want the upper left corner, or the lower right corner, which the sed code prints...

Any sed code automatically counts lines as it runs. When sed hits the first line, it runs what's in the first set of curly brackets, ('1{<code>}'). If that fails sed checks if it's the last line, ($ means last line), if it is, sed runs what's in the second set of curly brackets, ('${<code>}'). If it's not the first or last line sed deletes it.

Inside those curly brackets: s/ .*// works just like cut -f 1 would. The closing t means 'GOTO label', but when there's no 'label' sed just starts a new cycle, reading another line -- without t, the code would run the d, and print nothing. With two fields, s/.* // works like cut -f 2, etc.

Each pass of the main while loop sed prints two numbers, but each is on it's own line. Piping that to xargs echo -n puts both numbers on the same line as the $b was printed on.

Using shell commands within an awk script which must access the awk commands

Answers (1)

Related Questions