Reputation: 3022
In the below input
file I am using awk
to print out the lines that are blank
in $5
. The awk
does run and output results but it is the entire input file, not just the lines that are blank. My awk
version is GNU 4.0.1
. Thank you :)
input
chr6 32945523 32945792 chr6:32945523-32945792 BRD2-351|gc=50
chr6 32945892 32946175 chr6:32945892-32946175 BRD2-352|gc=53.5
chr6 32946856 32946981 chr6:32946856-32946981
chr6 32947594 32947919 chr6:32947594-32947919 BRD2-354|gc=51.2
desired result
chr6 32946856 32946981 chr6:32946856-32946981
awk
cat input | awk 'BEGIN {FS="\t"} $5=="" {print}'
current output
cat input | awk 'BEGIN {FS="\t"} $5=="" {print}'
chr6 32945523 32945792 chr6:32945523-32945792 BRD2-351|gc=50
chr6 32945892 32946175 chr6:32945892-32946175 BRD2-352|gc=53.5
chr6 32946856 32946981 chr6:32946856-32946981
chr6 32947594 32947919 chr6:32947594-32947919 BRD2-354|gc=51.2
chr6 32948108 32948251 chr6:32948108-32948251 BRD2-355|gc=43
edit: The below awk
works but I'm not sure why the original did not
awk '$5==""' input
Upvotes: 0
Views: 697
Reputation: 16118
I'm not sure why you're specifying a field separator (FS
) of tab (\t
). That should only be necessary if you have a TSV file (tab-separated values, similar to CSV). If you do indeed have a TSV file, meaning there are spaces in some values and/or two consecutive tabs indicate an empty field in the middle, you need awk 'BEGIN {FS="\t"} …'
or the shorter awk -F '\t' '…'
.
Try this:
awk 'NF < 5' input
If you have a TSV format that includes some empty fields, try this:
awk -F '\t' '$5 == ""' input
Here's a more reliable test given HTML's inability to represent tabs:
sample() {
echo 'chr6\t32945523\t32945792\tchr6:32945523-32945792\tBRD2-351|gc=50'
echo 'chr6\t32945892\t32946175\tchr6:32945892-32946175\tBRD2-352|gc=53.5'
echo 'chr6\t32946856\t32946981\tchr6:32946856-32946981'
echo 'chr6\t32947594\t32947919\tchr6:32947594-32947919\tBRD2-354|gc=51.2'
echo 'chr6\t32947594\t32947919\tchr6:32947594-32947919\t\ttest'
echo 'chr6\t32947594\t\tchr6:32947594-32947919\tBRD2-354|gc=51.2'
}
echo "unfiltered"
sample
echo "testing awk 'NF < 5'"
sample |awk 'NF < 5'
echo "\ntesting awk -F '\\\\t' '\$5 == \"\"'"
sample |awk -F '\t' '$5 == ""'
The last two lines of sample()
illustrate the difference between awk
's default (FS="[ \t]+"
, matching one or more space characters) and FS="\t"
.
With the default, you'll get that short line plus the final line since the whitespace between fields 3 and 5 is collapsed (TSV field 5 is awk
field 4). The "test" line collapses TSV field 6 into awk
field 5, so the default misses it.
The altered field separator will also get that short line. It will count fields for a TSV, noting the "test" line has an empty fifth entry ("test" is its sixth entry) and the final line's missing third field is noted as empty rather than collapsed, so the "BRD2" value is properly noted as the fifth TSV field.
unfiltered
chr6 32945523 32945792 chr6:32945523-32945792 BRD2-351|gc=50
chr6 32945892 32946175 chr6:32945892-32946175 BRD2-352|gc=53.5
chr6 32946856 32946981 chr6:32946856-32946981
chr6 32947594 32947919 chr6:32947594-32947919 BRD2-354|gc=51.2
chr6 32947594 32947919 chr6:32947594-32947919 test
chr6 32947594 chr6:32947594-32947919 BRD2-354|gc=51.2
testing awk 'NF < 5'
chr6 32946856 32946981 chr6:32946856-32946981
chr6 32947594 chr6:32947594-32947919 BRD2-354|gc=51.2
testing awk -F '\t' '$5 == ""'
chr6 32946856 32946981 chr6:32946856-32946981
chr6 32947594 32947919 chr6:32947594-32947919 test
Upvotes: 1