awk reparse piped input without getline

Question

I wrote an awk script that will parse piped input and turn it into a well spaced table. To achieve this I needed to parse the input stream twice. First to parse the actual column size for each column. And then for printing the table itself.

#!/bin/gawk -f
# with changes from ooga

BEGIN {
    FS=" "
    buffer = "mktemp" | getline result
    # Initialize Vars
}

{
    # Count Columns...
}

END{
    close(buffer)

    while((getline < buffer) > 0){
        # Print formated table
    }
}

So this is working but it uses getline and all manuals pointed out that there are very few cases where you really need getline. Thought the only other option I found was using files instead of pipes.

Is there another option in gawk that will parse piped input twice?

ooga · Accepted Answer

There's not really a better way to do it (EDIT: actually, it's probably better to use an array as Ed Morton has said; see his post and my alternate example at the end of this post), but it's not a very "awkish" program since it doesn't use the pattern{action} paradigm. The only advantage of awk for this program is the automatic field-splitting.

Some tips:

FS defaults to a single space (which has the special meaning that fields are separated by runs of whitespace and that leading and trailing whitespace is ignored.) So there's no need to explicitly set it to a space.
|& opens a coprocess, but you only need a regular pipe so just ust |.
You should explicitly close the pipe.
The function seems an unecessary complication.
You should delete the temporary file after you're finished with it.

This yields:

#!/bin/gawk -f

BEGIN {
    "mktemp" | getline tmpfile
    close("mktemp")
}

{
    # process and save piped data to tmpfile
}

END {
    close(tmpfile)
    while((getline < tmpfile) > 0) {
        # process data from tmpfile
    }
    system("rm " tmpfile)
}

Here's an example of using an array instead of a temporary file:

#!/bin/awk -f

{
    line[NR] = $0
    if (NF > nf)
        nf = NF;
    for (i=1; i<=NF; ++i)
        if (length($i) > flen[i])
            flen[i] = length($i)
}

END {
    for (r=1; r<=NR; ++r) {
        for (f=1; f<=nf; ++f) {
            split(line[r], fields)
            printf("| %-*s ", flen[f], fields[f])
        }
        print "|"
    }
}

Output:

$ cat file
one two three
four five six
seven eight nine
$ cat file | ./columnize.awk
| one   | two   | three |
| four  | five  | six   |
| seven | eight | nine  |
$

awk reparse piped input without getline

Answers (2)

Related Questions