star
star

Reputation: 213

Extract desired column with values

Please help me with this small script I am making I am trying to grep some columns with values from a big file (tabseparated) (mainFileWithValues.txt) which has this format:

A   B    C  ......... (total 700 columns)
80  2.08  23  
14 1.88  30  
12 1.81 40

Column names are in column.nam

cat  columnnam.nam

A
B
.
.
.

till 20 nmes

I am first taking column number from a big file using:

sed -n "1 s/${i}.*//p" mainFileWithValues.txt | sed 's/[^\t*]//g' |wc -c

Then using cut I am extracting values

I have made a for loop

#/bin/bash

for i in `cat columnnam.nam`    
do    
  cut -f`sed -n "1 s/${i}.*//p" mainFileWithValues.txt | sed 's/[^\t*]//g' |wc -c` mainFileWithValues.txt > test.txt    
done


cat test.txt    
A    
80    
14    
12    
B    
2.08    
1.88    
1.81

my problem is I want output test.txt to be in columns like main file. i.e.

A   B 
80  2.08

How can I fix this in this script?

Upvotes: 2

Views: 491

Answers (2)

Akshay Hegde
Akshay Hegde

Reputation: 16997

Here is one-liner:

awk 'FNR==NR{h[NR]=$1;next}{for(i=1; i in h; i++){if(FNR==1){for(j=1; j<=NF; j++){if(tolower(h[i])==tolower($j)){d[i]=j; break }}}printf("%s%s",i>1 ? OFS:"",  i in d ?$(d[i]):"")}print ""}' columns.nam mainfile 

Explanation:

[ note : case insensitive header match, remove tolower(), if you want strict match ]

awk '
    FNR==NR{                       # Here we read columns.nam file
       h[NR]=$1;                   # h -> array, NR -> as array key, $1 -> as array value
       next                        # go to next line
    }
    {                              # Here we read second file

     for(i=1; i in h; i++)         # iterate array h
     {
       if(FNR==1)                  # if we are reading 1st row of second file, will parse header
       {
        for(j=1; j<=NF; j++)       # iterate over fields of 1st row fields
        {
            # if it was the field we are looking for
            if(tolower(h[i])==tolower($j))
            {
              # then 
              # d -> array, i -> as array key which is column order number
              # j -> as array value which is column number
              d[i]=j; 
              break 
            }
        }
       }    
       # for all records
       # if field we searched was found then print such field
       # from d[i] we access, column number

       printf("%s%s",i>1 ? OFS:"",  i in d ? $(d[i]): "");
      }

      # print newline char
      print ""
    }
    ' columns.nam mainfile 

Test Results:

$ cat mainfile 
A   B    C  
80  2.08  23  
14 1.88  30  
12 1.81 40

$ cat columns.nam 
A
C

$ awk 'FNR==NR{h[NR]=$1;next}{for(i=1; i in h; i++){if(FNR==1){for(j=1; j<=NF; j++){if(tolower(h[i])==tolower($j)){d[i]=j; break }}}printf("%s%s",i>1 ? OFS:"",  i in d ?$(d[i]):"")}print ""}' columns.nam mainfile 
A C
80 23
14 30
12 40

You can also make script and run

akshay@db-3325:/tmp$ cat col_parser.awk 
FNR == NR {
  h[NR] = $1;
  next
} 
{
  for (i = 1; i in h; i++) {
    if (FNR == 1) {
      for (j = 1; j <= NF; j++) {
        if (tolower(h[i]) == tolower($j)) {
          d[i] = j;
          break
        }
      }
    }
    printf("%s%s", i > 1 ? OFS : "", i in d ? $(d[i]) : "");
  }
  print ""
}

akshay@db-3325:/tmp$ awk -v OFS="\t" -f col_parser.awk columns.nam mainfile 
A      C
80     23
14     30
12     40

Similar Answer

Upvotes: 4

anubhava
anubhava

Reputation: 786289

Another awk approach:

awk 'NR == FNR {
   hdr[$1]
   next
}
FNR == 1 {
   for (i=1; i<=NF; i++)
      if ($i in hdr)
         h[i]
}
{
   s=""
   for (i in h)
      s = s (s == "" ? "" : OFS) $i
   print s
}' column.nam mainFileWithValues.txt

A B
80 2.08
14 1.88
12 1.81

To get formatted output pipe above command to column -t

Upvotes: 3

Related Questions