Reputation: 11377
I have roughly 30 GB of text files that have 6 lines of header and then content that can be best imagined as a matrix, from tiny 1x1 to having tens of thousands of rows and columns. Numbers in the content can take only two values: 0 and 1.
I would like to find all files that have content filled with zeros, so not a single '1' value. Writing a script in, say, Python should be straightforward, but I would like to learn how to this in e.g. awk, grep or sed.
One way I can think of is just to use grep to search for '1' and if it is not found in the given file then we have a match (since we have only two possible values) - but how can I search from a specific line, i.e. skip header?
Upvotes: 1
Views: 570
Reputation: 203324
awk -F'1' '
FNR>6 && NF>1 { f=1; nextfile }
ENDFILE { print FILENAME, (f ? "got a one" : "all zeros"); f=0 }
' file1 file2 ...
The above uses GNU awk for ENDFILE and nextfile.
Upvotes: 1
Reputation: 246774
My take:
for file in *; do
if sed 1,6d "$file" | grep -q 1; then
echo "$file has a one"
else
echo "$file has no ones"
fi
done
With GNU sed, you can write
for file in *; do
if sed -n '1,6d; /1/ q 1' "$file"; then
echo "$file has no ones"
else
echo "$file has a one"
fi
done
Upvotes: 1
Reputation: 63902
The next script will count the total number of 1
in the given file, e.g. not only the number of lines what contains some 1
but the real number of 1
in all lines and all columns:
file="somefile.txt"
tail +7 "$file" | grep -o 1 | grep -c '.'
^^^^^^^^^^^^^^^ ^^^^^^^^^ ^^^^^^^^^^^
| | +--- count the number of lines
| +---- filter out all "1" - each on alone line
+-------- prints the file from the 7th line
you can use it like
file="somefile"
ones=$(tail +6 "$file" | grep -o 1 | grep -c '.')
case "$ones" in
0) do_something "$file" ;; #no 1 in the file
*) do_other "$file" "$ones" ;; #here is $ones number of "1"
esac
you can count the 1 with perl also
perl -nlE '$.<7&&next;$c+=()=m/1/g}{say $c' < filename
e.g
ones=$(perl -nlE '$.<7&&next;$c+=()=m/1/g}{say $c' < filename)
Upvotes: 0
Reputation: 103774
Suppose I have the two files:
$ cat 1_1.txt
Header 1
Header 2
Header 3
0 0 0 0 0
0 0 0 1 0
0 0 0 0 0
$ cat zereos.txt
Header 1
Header 2
Header 3
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
You can use sed to skip over N lines of the header and print then any lines with a 1
in them:
$ sed -n '1,3d; /1/p' zereos.txt
$ sed -n '1,3d; /1/p' 1_1.txt
0 0 0 1 0
So now combine that into a Bash script:
for file in *
do rtr=$(sed -n '1,3d; /1/p' "$file")
if [[ $rtr =~ ^$ ]]; then echo "$file"
fi
done
Prints
zereos.txt
Upvotes: 1
Reputation: 15501
I think you might be looking for something like this:
gawk '
BEGINFILE { no_ones = 1 };
NR < 7 { next };
/1/ { no_ones = 0; nextfile };
ENDFILE { if (no_ones) print FILENAME }
' files...
This uses GNU awk (for BEGINFILE, ENDFILE, nextfile).
Upvotes: 0