print columns based on column name

Question

Let's say I have a file test.txt that contains

a,b,c,d,e
1,2,3,4,5
6,7,8,9,10

I want to print out columns based on matching column names, either from another text file or from an array. So for example if I was given

arr=(a b c)

I want my output to then be

a,b,c
1,2,3
6,7,8

How can I do this with bash utilities/awk/sed? My actual text file is 3GB (and the line I want to match column values on is actually line 3), so efficient solutions are appreciated. This is what I have so far:

for j in "${arr[@]}"; do awk -F ',' -v a=$j '{ for(i=1;i<=NF;i++) {if($i==a) {print $i}}}' test.txt; done

but the output I get is

a
b
c

which not only is missing the other rows, but each column name is printed on one line each.

RavinderSingh13 · Accepted Answer

With your shown samples, please try following. Code is reading 2 files another_file.txt(which has a b c as per samples) and actual Input_file named test.txt(which has all the values in it).

awk '
FNR==NR{
  for(i=1;i<=NF;i++){
    arr[$i]
  }
  next
}
FNR==1{
  for(i=1;i<=NF;i++){
    if($i in arr){
      valArr[i]
      header=(header?header OFS:"")$i
    }
  }
  print header
  next
}
{
  val=""
  for(i=1;i<=NF;i++){
    if(i in valArr){
       val=(val?val OFS:"")$i
    }
  }
  print val
}
' another_file.txt FS="," OFS="," test.txt

Output will be as follows:

a,b,c
1,2,3
6,7,8

Explanation: Adding detailed explanation for above solution.

awk '                                         ##Starting awk program from here.
FNR==NR{                                      ##Checking condition which will be TRUE while reading another_text file here.
  for(i=1;i<=NF;i++){                         ##Traversing through all fields of current line.
    arr[$i]                                   ##Creating arr with index of current field value.
  }
  next                                        ##next will skip all statements from here.
}
FNR==1{                                       ##Checking if this is 1st line for test.txt file.
  for(i=1;i<=NF;i++){                         ##Traversing through all fields of current line.
    if($i in arr){                            ##If current field values comes in arr then do following.
      valArr[i]                               ##Creating valArr which has index of current field number.
      header=(header?header OFS:"")$i         ##Creating header which has each field value in it.
    }
  }
  print header                                ##Printing header here.
  next                                        ##next will skip all statements from here.
}
{
  val=""                                      ##Nullifying val here.
  for(i=1;i<=NF;i++){                         ##Traversing through all fields of current line.
    if(i in valArr){                          ##Checking if i is present in valArr then do following.
       val=(val?val OFS:"")$i                 ##Creating val which has current field value.
    }
  }
  print val                                   ##printing val here.
}
' another_file.txt FS="," OFS="," test.txt    ##Mentioning Input_file names here.

print columns based on column name

Answers (2)

Related Questions