Ell
Ell

Reputation: 947

Awk solution for table formatting

There is a table in the below format. Is it possible to have an AWK script to format the table in such a way it excludes columns which contain only the number "1"?

ST L1 L2 L3 L4 L5
ST2 1 1 1 1 1
ST2 1 0 1 0 1
ST3 1 0 1 0 1
ST3 0 0 1 1 1
ST4 1 0 1 0 1
ST5 1 0 1 0 1
ST6 1 0 1 0 1
ST7 0 0 1 1 1
ST8 0 0 1 0 1
ST9 1 0 1 0 1

Output should be as below:

ST L1 L2 L4
ST2 1 1 1
ST2 1 0 0
ST3 1 0 0
ST3 0 0 1
ST4 1 0 0
ST5 1 0 0
ST6 1 0 0
ST7 0 0 1
ST8 0 0 0
ST9 1 0 0

I can sort of understand the logic in how a column should be printed, as in whatever the value of NR in the end block, if that is equal to the variable which should be incremented each time 1 is found, for a given column (except header NR==1 and column $1), print the column. My trouble lies in actually trying to print the columns in the end block, as I am trying to use arrays and I am still learning AWK and array's. I am sure there is some clever way out there of doing this though without even using arrays and simply changing the way AWK looks at the data.

Upvotes: 2

Views: 395

Answers (2)

kuroi neko
kuroi neko

Reputation: 8651

This should do the trick:

    {
        # store current line
        line[FNR] = $0

        if (FNR > 1) # skip header
        {
            # select columns
            for (i = 1 ; i <= NF ; i++)
            {
                if ($i != 1) selected[i] = 1
            }
        }
    }

END {
        for (li = 1 ; li <= FNR ; li++)
        {
            # parse current line
            $0 = line[li]

            # pick selected fields
            for (i = j = 1 ; i <= NF ; i++)
            {
                if (selected[i]) $(j++) = $i
            }

            # trim record to selection
            NF = j-1
            print
        }
    }

After Ed Morton's remarks:

  • changed the l to something less ambiguous
  • printf is a statement indeed, but adding parentheses won't hurt either, or would it?
  • agreed for print "" being better than printf "\n"
  • semicolons are optional but won't hurt. I feel more comfortable with something that looks like C
  • NR was a typo that went unnoticed (since it produced the expected output by pure luck). I meant NF.
  • changed the logic so that no trailing blanks are added (and printf is not used anymore)

After a second batch of remarks:

  • changed output record generation, to avoid extraneous separators.

Thanks a lot for the proofreading. It's been nearly 15 years since I last did some serious hawk programming, and the rust has sadly set in.

Upvotes: 2

Ed Morton
Ed Morton

Reputation: 203254

awk '
NR==FNR {
    if (NR > 1) {
        for (i=1;i<=NF;i++) {
            if ($i != 1) {
                nonOnes[i]
            }
        }
    }
    next
}
{
    ofs=""
    for (i=1;i<=NF;i++) {
        if (i in nonOnes) {
            printf "%s%s", ofs, $i
            ofs=OFS
        }
    }
    print ""
}
' file file
ST L1 L2 L4
ST2 1 1 1
ST2 1 0 0
ST3 1 0 0
ST3 0 0 1
ST4 1 0 0
ST5 1 0 0
ST6 1 0 0
ST7 0 0 1
ST8 0 0 0
ST9 1 0 0

If you don't want to list the same file twice on the command line you can tweak to add this BEGIN section:

BEGIN { ARGV[ARGC] = ARGV[ARGC-1]; ARGC++ } 

Upvotes: 2

Related Questions