HollowBastion
HollowBastion

Reputation: 223

Search and print specific columns from tab delimited file?

I can use awk to print the nth column from a file; the cut command also can do a similar thing.. but I require the column to be taken based on its name, for example:

col1 col2 col3 col4
2 5 3 1
6 4 7 1 
3 6 5 9
7 9 7 8

and if I give a list of column names as input: e.g. col1, col3 (is is going to be a long list of column names, so it would help if the input could be an array)

the output would be

col1 col3
2 3
6 7 
3 5
7 7

does anyone know how I might do this in bash?

Upvotes: 1

Views: 556

Answers (2)

Ed Morton
Ed Morton

Reputation: 203189

$ cat tst.awk
BEGIN { FS=OFS="\t" }
NR==1 {
    for (i=1;i<=NF;i++) {
        if ( match(cols,"(^| )"$i"( |$)") ) {
            colNrs[++numCols] = i
        }
    }
}
{
    for (i=1;i<=numCols;i++) {
        printf "%s%s", $(colNrs[i]), (i<numCols?OFS:ORS)
    }
}

$ awk -v cols="col1 col3" -f tst.awk file
col1    col3
2       3
6       7
3       5
7       7

Upvotes: 1

John1024
John1024

Reputation: 113814

$ awk -v s="col1 col3" 'BEGIN{split(s,v," ");for (i=1;i<=length(v);i++)a[v[i]]=1} NR==1{split($0,b,"\t")} {for (i=1;i<=NF;i++)if (b[i] in a)printf "%s\t",$i;print""}' file
col1    col3
2       3
6       7
3       5
7       7

How it works

  • -v s="col1 col3"

    Define an awk variable s containing a space-separated list of the columns that you want to keep.

  • BEGIN{split(s,v," ");for (i=1;i<=length(v);i++)a[v[i]]=1}

    Create an associative array a whose keys are the column names and whose values are one for columns in the string s.

  • NR==1{split($0,b,"\t")}

    Save the columns names in an associative array b.

  • for (i=1;i<=NF;i++) if (b[i] in a) printf "%s\t",$i; print""

    For each column, i, if the column name, b[i] is in array a, print the column followed by a tab.

    To finish, print "" prints a newline.

Upvotes: 1

Related Questions