Reputation: 663
I have a very large tab separated table (24 gb in size) with C1
and C2
, C3
and C4
columns as shown below. I would like to extract rows that have C1 < 0.6 and C2 < 0.4. How do I do in unix/ shell using logical operators?
C1 C2 C3 C4
0.8 0.1 A1 C.a
0.2 0.3 A2 C.b
0.5 0.8 A3 C.c
0.1 0.1 A4 C.c
Result I expect:
C1 C2 C3 C4
0.2 0.3 A2 C.b
0.1 0.1 A4 C.c
Upvotes: 0
Views: 945
Reputation: 133458
1st solution: This simple awk
should do the job for you.
awk 'FNR==1 || ($1<.6 && $2<.4)' Input_file
OR for tab separated Input_file try following:
awk 'BEGIN{FS=OFS="\t"}FNR==1 || ($1<.6 && $2<.4)' Input_file
2nd solution(Generic one): In case you don't want to hard code field number of field c1
and c2
and want to get it programmatically then try following. Add BEGIN{FS=OFS="\t"}
in following in case your Input_file is TAB delimited.
awk -v c1Thre="0.6" -v c2Thre="0.4" '
FNR==1{
for(i=1;i<=NF;i++){
if($i=="C1"){ C1Field=i }
if($i=="C2"){ C2Field=i }
}
print
next
}
$C1Field<c1Thre && $C2Field<c2Thre
' Input_file
Upvotes: 1
Reputation: 961
try this : I have removed spaces ( there are 3/4 spaces ) and changed them to "," for processing :
cat mydata.txt | tr -s " " "," | awk -F"," 'BEGIN { X = NF } { for (i = 0; i <= X; i = i + 1) if($1 < 0.6 && $2<0.4) print $0}'
Upvotes: 0