userABC123
userABC123

Reputation: 1500

Get lengths of zeroes (interrupted by ones)

I have a long column of ones and zeroes:

0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
1
....

I can easily get the average number of zeroes between ones (just total/ones):

ones=$(grep -c 1 file.txt)
lines=$(wc -l < file.txt)
echo "$lines / $ones" | bc -l

But how can I get the length of strings of zeroes between the ones? In the short example above it would be:

3
5
5
2

Upvotes: 15

Views: 1415

Answers (14)

hek2mgl
hek2mgl

Reputation: 158010

The simplest solution would be to use sed together with awk, like this:

sed -n '$bp;/0/{:r;N;/0$/{h;br}};/1/{x;bp};:p;/.\+/{s/\n//g;p}' input.txt \
  | awk '{print length}'

Explanation:

The sed command separates the 0s and creates output like this:

000
00000
00000
00

Piped into awk '{print length}' you can get the count of 0 for each interval:

Output:

3
5
5
2

Upvotes: 2

fedorqui
fedorqui

Reputation: 289755

Expanding erickson's excellent answer, you can say:

$ uniq -c file | awk '!$2 {print $1}'
3
5
5
2

From man uniq we see that the purpose of uniq is to:

Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or standard output).

So uniq groups the numbers. Using the -c option we get a prefix with the number of occurrences:

$ uniq -c file
      3 0
      1 1
      5 0
      1 1
      5 0
      1 1
      2 0
      1 1

Then it is a matter of printing those the counters before the 0. For this we can use awk like: awk '!$2 {print $1}'. That is: print the second field if the field is 0.

Upvotes: 2

clt60
clt60

Reputation: 63922

More strange (and not fully correct) way:

perl -0x31 -laE 'say @F+0' <file

prints

3
5
5
2
0

It

  • reads the file with the record separator is set to character 1 the -0x31
  • with autosplit -a (splits the record into array @F)
  • and prints the number of elements in @F e.g. say @F+0 or could use say scalar @F

Unfortunately, after the final 1 (as record separator) it prints an empty record - therefore prints the last 0.

It is incorrect solution, showing it only as alternative curiosity.

Upvotes: 2

Tiago Lopo
Tiago Lopo

Reputation: 7959

if you can use perl:

perl -lne 'BEGIN{$counter=0;} if ($_ == 1){ print $counter; $counter=0; next} $counter++' file
3
5
5
2

It actually looks better with awk same logic:

awk '$1{print c; c=0} !$1{c++}' file 
3
5
5
2

Upvotes: 3

gniourf_gniourf
gniourf_gniourf

Reputation: 46833

A funny one, in pure Bash:

while read -d 1 -a u || ((${#u[@]})); do
    echo "${#u[@]}"
done < file

This tells read to use 1 as a delimiter, i.e., to stop reading as soon as a 1 is encountered; read stores the 0's in the fields of the array u. Then we only need to count the number of fields in u with ${#u[@]}. The || ((${#u[@]})) is here just in case your file doesn't end with a 1.

Upvotes: 2

hek2mgl
hek2mgl

Reputation: 158010

You can use awk:

awk '$1=="0"{s++} $1=="1"{if(s)print s;s=0} END{if(s)print(s)}'

Explanation:

The special variable $1 contains the value of the first field (column) of a line of text. Unless you specify the field delimiter using the -F command line option it defaults to a widespace - meaning $1 will contain 0 or 1 in your example.

If the value of $1 equals 0 a variable called s will get incremented but if $1 is equal to 1 the current value of s gets printed (if greater than zero) and re-initialized to 0. (Note that awk initializes s with 0 before the first increment operation)

The END block gets executed after the last line of input has been processed. If the file ends with 0(s) the number of 0s between the file's end and the last 1 will get printed. (Without the END block they wouldn't printed)

Output:

3
5
5
2

Upvotes: 3

erickson
erickson

Reputation: 269687

I'd include uniq for a more easily read approach:

uniq -c file.txt | awk '/ 0$/ {print $1}'

Upvotes: 17

fedorqui
fedorqui

Reputation: 289755

Using awk, I would use the fact that a field with the value 0 evaluates as False:

awk '!$1{s++; next} {if (s) print s; s=0} END {if (s) print s}' file

This returns:

3
5
5
2

Also, note the END block to print any "remaining" zeroes appearing after the last 1.

Explanation

  • !$1{s++; next} if the field is not True, that is, if the field is 0, increment the counter. Then, skip to the next line.
  • {if (s) print s; s=0} otherwise, print the value of the counter and reset it, but just if it contains some value (to avoid printing 0 if the file starts with a 1).
  • END {if (s) print s} print the remaining value of the counter after processing the file, but just if it wasn't printed before.

Upvotes: 7

anubhava
anubhava

Reputation: 785196

This seems to be pretty popular question today. Joining the party late, here is another short gnu-awk command to do the job:

awk -F '\n' -v RS='(1\n)+' 'NF{print NF-1}' file
3
5
5
2

How it works:

-F '\n'           # set input field separator as \n (newline)
-v RS='(1\n)+'    # set input record separator as multipled of 1 followed by newline
NF                # execute the block if minimum one field is found
print NF-1        # print num of field -1 to get count of 0

Upvotes: 5

Ben Grimm
Ben Grimm

Reputation: 4371

If your file.txt is just a column of ones and zeros, you can use awk and change the record separator to "1\n". This makes each "record" a sequence of "0\n", and the count of 0's in the record is the length of the record divided by 2. Counts will be correct for leading and trailing ones and zeros.

awk 'BEGIN {RS="1\n"} { print length/2 }' file.txt

Upvotes: 5

clt60
clt60

Reputation: 63922

Another way:

perl -lnE 'if(m/1/){say $.-1;$.=0}' < file

"reset" the line counter when 1.

prints

3
5
5
2

Upvotes: 3

rici
rici

Reputation: 241741

Edit: fixed for the case where the last line is a 0

Easy in awk:

awk '/1/{print NR-prev-1; prev=NR;}END{if (NR>prev)print NR-prev;}'

Not so difficult in bash, either:

i=0
for x in $(<file.txt); do
  if ((x)); then echo $i; i=0; else ((++i)); fi
done
((i)) && echo $i 

Upvotes: 10

Sas
Sas

Reputation: 2503

My attempt. Not so pretty but.. :3

grep -n 1 test.txt | gawk '{y=$1-x; print y-1; x=$1}' FS=":"

Out:

3
5
5
2

Upvotes: 2

choroba
choroba

Reputation: 241898

Pure bash:

sum=0
while read n ; do
    if ((n)) ; then
        echo $sum
        sum=0
    else
        ((++sum))
    fi
done < file.txt
((sum)) && echo $sum # Don't forget to output the last number if the file ended in 0.

Upvotes: 3

Related Questions