Lukasz Tracewski
Lukasz Tracewski

Reputation: 11377

Find all files that are filled with zeros

I have roughly 30 GB of text files that have 6 lines of header and then content that can be best imagined as a matrix, from tiny 1x1 to having tens of thousands of rows and columns. Numbers in the content can take only two values: 0 and 1.

I would like to find all files that have content filled with zeros, so not a single '1' value. Writing a script in, say, Python should be straightforward, but I would like to learn how to this in e.g. awk, grep or sed.

One way I can think of is just to use grep to search for '1' and if it is not found in the given file then we have a match (since we have only two possible values) - but how can I search from a specific line, i.e. skip header?

Upvotes: 1

Views: 570

Answers (5)

Ed Morton
Ed Morton

Reputation: 203324

awk -F'1' '
FNR>6 && NF>1 { f=1; nextfile }
ENDFILE { print FILENAME, (f ? "got a one" : "all zeros"); f=0 }
' file1 file2 ...

The above uses GNU awk for ENDFILE and nextfile.

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 246774

My take:

for file in *; do
    if sed 1,6d "$file" | grep -q 1; then
        echo "$file has a one"
    else
        echo "$file has no ones"
    fi
done

With GNU sed, you can write

for file in *; do
    if sed -n '1,6d; /1/ q 1' "$file"; then
        echo "$file has no ones"
    else
        echo "$file has a one"
    fi
done

Upvotes: 1

clt60
clt60

Reputation: 63902

The next script will count the total number of 1 in the given file, e.g. not only the number of lines what contains some 1 but the real number of 1 in all lines and all columns:

file="somefile.txt"
tail +7 "$file" | grep -o 1 | grep -c '.'
^^^^^^^^^^^^^^^   ^^^^^^^^^   ^^^^^^^^^^^
       |              |            +--- count the number of lines
       |              +---- filter out all "1" - each on alone line
       +-------- prints the file from the 7th line

you can use it like

file="somefile"
ones=$(tail +6 "$file" | grep -o 1 | grep -c '.')
case "$ones" in
    0) do_something "$file" ;;       #no 1 in the file
    *) do_other "$file" "$ones" ;;   #here is $ones number of "1"
esac

you can count the 1 with perl also

perl -nlE '$.<7&&next;$c+=()=m/1/g}{say $c' < filename

e.g

ones=$(perl -nlE '$.<7&&next;$c+=()=m/1/g}{say $c' < filename)

Upvotes: 0

dawg
dawg

Reputation: 103774

Suppose I have the two files:

$ cat 1_1.txt
Header 1
Header 2
Header 3
0 0 0 0 0
0 0 0 1 0
0 0 0 0 0
$ cat zereos.txt
Header 1
Header 2
Header 3
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0

You can use sed to skip over N lines of the header and print then any lines with a 1 in them:

$ sed -n '1,3d; /1/p' zereos.txt 
$ sed -n '1,3d; /1/p' 1_1.txt 
0 0 0 1 0

So now combine that into a Bash script:

for file in *
   do rtr=$(sed -n '1,3d; /1/p' "$file")
   if [[ $rtr =~ ^$ ]]; then echo "$file" 
   fi
done

Prints

zereos.txt

Upvotes: 1

ooga
ooga

Reputation: 15501

I think you might be looking for something like this:

gawk '
  BEGINFILE { no_ones = 1 };
  NR < 7 { next };
  /1/ { no_ones = 0; nextfile };
  ENDFILE { if (no_ones) print FILENAME }
' files...

This uses GNU awk (for BEGINFILE, ENDFILE, nextfile).

Upvotes: 0

Related Questions