lordlabakdas
lordlabakdas

Reputation: 1193

return whole column of tab separated csv file that has grep match

Lets say I have a tab separated csv file as below:

a b c  
d e f  
g h i  

Using commandline utilities, is there a way I could return the whole column that matches a required grep pattern or in the above example, I would like to return the second column for a grep of b?

Upvotes: 0

Views: 441

Answers (3)

user1666959
user1666959

Reputation: 1855

You really need to give a hint how big your files are, how often you want to run this, and how many columns you have. But

  1. grep is fast(er than awk)
  2. unless your files are huge, they are likely to be cached (so it is OK to read them twice)

Based on the observations above, I would

  1. grep the file for the pattern (pipe the result to uniq if needed)
  2. calculate which columns are needed from grep's output
  3. run awk with -vCOLS="c1 c2 c3..." and a trivial script which prints the columns specified by c1, c2...

Upvotes: 0

Kent
Kent

Reputation: 195129

 awk -F'\t' -v pat="b" 'NR==FNR{for(i=1;i<=NF;i++)if($i~pat)c[i];next}
                        {s="";for(i=1;i<=NF;i++)
                         if(i in c)s=s sprintf("%s\t", $i);
                         sub(/\t$/,"",s);print s}' file file

this line does the job.

  • it will print any columns that matches your pat, and keep in column format.
  • the pat is regex, you can pass a shell variable to the awk line
  • the output follow the original column order

take a look the example: (I add a b in your 3rd column to show multiple matching case):

kent$  cat f
a       b       c
d       e       b
g       h       i

kent$  awk -F'\t' -v pat="b" 'NR==FNR{for(i=1;i<=NF;i++)if($i~pat)c[i];next}{s="";for(i=1;i<=NF;i++)if(i in c)s=s sprintf("%s\t", $i);sub(/\t$/,"",s);print s}' f f
b       c
e       b
h       i

Upvotes: 1

fedorqui
fedorqui

Reputation: 289835

If there is just a matching, you can do for example this:

$ awk -v patt="b" 'FNR==NR {for (i=1;i<=NF;i++) $i~patt && col=i; next} {print $col}' file file
b
e
h

Explanation

It loops twice through the file. Firstly to get the column number of the matched text. Secondly to print that specific column.

  • -v patt="b" give the pattern
  • FNR==NR {for (i=1;i<=NF;i++) $i~patt && col=i; next} on the first read, loop through fields and check if the pattern is matched. If so, store the column number in the col var.
  • {print $col} print that specific col of all lines.

Upvotes: 1

Related Questions