Reputation: 223
I can use awk to print the nth column from a file; the cut command also can do a similar thing.. but I require the column to be taken based on its name, for example:
col1 col2 col3 col4
2 5 3 1
6 4 7 1
3 6 5 9
7 9 7 8
and if I give a list of column names as input: e.g. col1, col3 (is is going to be a long list of column names, so it would help if the input could be an array)
the output would be
col1 col3
2 3
6 7
3 5
7 7
does anyone know how I might do this in bash?
Upvotes: 1
Views: 556
Reputation: 203189
$ cat tst.awk
BEGIN { FS=OFS="\t" }
NR==1 {
for (i=1;i<=NF;i++) {
if ( match(cols,"(^| )"$i"( |$)") ) {
colNrs[++numCols] = i
}
}
}
{
for (i=1;i<=numCols;i++) {
printf "%s%s", $(colNrs[i]), (i<numCols?OFS:ORS)
}
}
$ awk -v cols="col1 col3" -f tst.awk file
col1 col3
2 3
6 7
3 5
7 7
Upvotes: 1
Reputation: 113814
$ awk -v s="col1 col3" 'BEGIN{split(s,v," ");for (i=1;i<=length(v);i++)a[v[i]]=1} NR==1{split($0,b,"\t")} {for (i=1;i<=NF;i++)if (b[i] in a)printf "%s\t",$i;print""}' file
col1 col3
2 3
6 7
3 5
7 7
-v s="col1 col3"
Define an awk variable s
containing a space-separated list of the columns that you want to keep.
BEGIN{split(s,v," ");for (i=1;i<=length(v);i++)a[v[i]]=1}
Create an associative array a
whose keys are the column names and whose values are one for columns in the string s
.
NR==1{split($0,b,"\t")}
Save the columns names in an associative array b
.
for (i=1;i<=NF;i++) if (b[i] in a) printf "%s\t",$i; print""
For each column, i
, if the column name, b[i]
is in array a
, print the column followed by a tab.
To finish, print ""
prints a newline.
Upvotes: 1