Trying to combine awk and zcat with multiple filtering criteria

Question

I have very large file (40m x 400 columns).

Structure like:

chr  pos  snp
1   1   rs500
2   4   rs501
2   6   rs502
17   6   rs503

Given a name myfile.gz

To search 3rd column for a given value the following works:

zcat myfile | grep rs500$

However, to search for two criteria - say chr = 17 and pos = 6 I was trying to do the following, but can't get it to return values.

zcat myfile | awk '{ if ($1 == 17 && $2 == 6) print }'

No error, but no return of anything. I've done this kind of filtering in the past when the file wasn't .gz compressed with no issue.

such as this command in a much larger different file that filters two columns on criteria and then retrieves the results.

"awk '{ if (NR == 1 || ($39  >= 0.03 && $36 <= 1e-04)) print }' myfile.notgzcompressed"

But I can't seem to combine that syntax with the need for zcat, because I don't want to have to unzip my huge archive

EDIT to add information based on comments
zcat myfile.gz | head -2 | od -c
0000000   c   h   r  	   p   o   s  	   r   e   f  	   a   l   t  	
0000020   c   h   r   _   h   g   1   9  	   p   o   s   _   h   g   1
0000040   9  	   r   e   f   _   h   g   1   9  	   a   l   t   _   h
0000060   g   1   9  	   V   E   P   _   e   n   s   e   m   b   l   _
0000100   s   u   m   m   a   r   y  	   r   s   _   d   b   S   N   P
0000120   1   5   1  
   1  	   1   0   1   8   0  	   T  	   C  	
0000140   1  	   1   0   1   8   0  	   T  	   C  	   W   A   S   H
0000160   7   P   (   1   )   :   d   o   w   n   s   t   r   e   a   m
0000200   _   g   e   n   e   _   v   a   r   i   a   n   t   (   1   )
0000220   |   D   D   X   1   1   L   1   (   2   )   :   u   p   s   t
0000240   r   e   a   m   _   g   e   n   e   _   v   a   r   i   a   n
0000260   t   (   2   )  	   r   s   2   0   1   6   9   4   9   0   1
0000300

For more info, I am using R and fread() to pass commands like this so that unix does the parsing prior to loading into the R environment. This chr and pos lookup have been assigned.

fread(cmd = paste0("zcat ", myfile, " | awk ","'{ if ($1  == ", chr ," && $2 == ",pos,") print }'")) -> h2

Trying to combine awk and zcat with multiple filtering criteria

Answers (1)

Related Questions