august
august

Reputation: 747

print columns based on column name

Let's say I have a file test.txt that contains

a,b,c,d,e
1,2,3,4,5
6,7,8,9,10

I want to print out columns based on matching column names, either from another text file or from an array. So for example if I was given

arr=(a b c)

I want my output to then be

a,b,c
1,2,3
6,7,8

How can I do this with bash utilities/awk/sed? My actual text file is 3GB (and the line I want to match column values on is actually line 3), so efficient solutions are appreciated. This is what I have so far:

for j in "${arr[@]}"; do awk -F ',' -v a=$j '{ for(i=1;i<=NF;i++) {if($i==a) {print $i}}}' test.txt; done

but the output I get is

a
b
c

which not only is missing the other rows, but each column name is printed on one line each.

Upvotes: 1

Views: 387

Answers (2)

RavinderSingh13
RavinderSingh13

Reputation: 133458

With your shown samples, please try following. Code is reading 2 files another_file.txt(which has a b c as per samples) and actual Input_file named test.txt(which has all the values in it).

awk '
FNR==NR{
  for(i=1;i<=NF;i++){
    arr[$i]
  }
  next
}
FNR==1{
  for(i=1;i<=NF;i++){
    if($i in arr){
      valArr[i]
      header=(header?header OFS:"")$i
    }
  }
  print header
  next
}
{
  val=""
  for(i=1;i<=NF;i++){
    if(i in valArr){
       val=(val?val OFS:"")$i
    }
  }
  print val
}
' another_file.txt FS="," OFS="," test.txt

Output will be as follows:

a,b,c
1,2,3
6,7,8

Explanation: Adding detailed explanation for above solution.

awk '                                         ##Starting awk program from here.
FNR==NR{                                      ##Checking condition which will be TRUE while reading another_text file here.
  for(i=1;i<=NF;i++){                         ##Traversing through all fields of current line.
    arr[$i]                                   ##Creating arr with index of current field value.
  }
  next                                        ##next will skip all statements from here.
}
FNR==1{                                       ##Checking if this is 1st line for test.txt file.
  for(i=1;i<=NF;i++){                         ##Traversing through all fields of current line.
    if($i in arr){                            ##If current field values comes in arr then do following.
      valArr[i]                               ##Creating valArr which has index of current field number.
      header=(header?header OFS:"")$i         ##Creating header which has each field value in it.
    }
  }
  print header                                ##Printing header here.
  next                                        ##next will skip all statements from here.
}
{
  val=""                                      ##Nullifying val here.
  for(i=1;i<=NF;i++){                         ##Traversing through all fields of current line.
    if(i in valArr){                          ##Checking if i is present in valArr then do following.
       val=(val?val OFS:"")$i                 ##Creating val which has current field value.
    }
  }
  print val                                   ##printing val here.
}
' another_file.txt FS="," OFS="," test.txt    ##Mentioning Input_file names here.

Upvotes: 6

anubhava
anubhava

Reputation: 785098

Here is how you can do this in a single pass awk command:

arr=(a c e)
awk -v cols="${arr[*]}" 'BEGIN {FS=OFS=","; n=split(cols, tmp, / /); for (i=1; i<=n; ++i) hdr[tmp[i]]} NR==1 {for (i=1; i<=NF; ++i) if ($i in hdr) hnum[i]} {for (i=1; i<=NF; ++i) if (i in hnum) {printf "%s%s", (f ? OFS : ""), $i; f=1} f=0; print ""}' file

a,c,e
1,3,5
6,8,10

A more readable form:

awk -v cols="${arr[*]}" '
BEGIN {
   FS = OFS = ","
   n = split(cols, tmp, / /)
   for (i=1; i<=n; ++i)
      hdr[tmp[i]]
}
NR == 1 {
   for (i=1; i<=NF; ++i)
      if ($i in hdr)
         hnum[i]
}
{
   for (i=1; i<=NF; ++i)
      if (i in hnum) {
         printf "%s%s", (f ? OFS : ""), $i
         f = 1
      }
      f = 0
      print ""
}' file

Upvotes: 3

Related Questions