Reputation: 1500
I have a long column of ones and zeroes:
0
0
0
1
0
0
0
0
0
1
0
0
0
0
0
1
0
0
1
....
I can easily get the average number of zeroes between ones (just total/ones):
ones=$(grep -c 1 file.txt)
lines=$(wc -l < file.txt)
echo "$lines / $ones" | bc -l
But how can I get the length of strings of zeroes between the ones? In the short example above it would be:
3
5
5
2
Upvotes: 15
Views: 1415
Reputation: 158010
The simplest solution would be to use sed
together with awk
, like this:
sed -n '$bp;/0/{:r;N;/0$/{h;br}};/1/{x;bp};:p;/.\+/{s/\n//g;p}' input.txt \
| awk '{print length}'
Explanation:
The sed
command separates the 0
s and creates output like this:
000
00000
00000
00
Piped into awk '{print length}'
you can get the count of 0
for each interval:
Output:
3
5
5
2
Upvotes: 2
Reputation: 289755
Expanding erickson's excellent answer, you can say:
$ uniq -c file | awk '!$2 {print $1}'
3
5
5
2
From man uniq
we see that the purpose of uniq
is to:
Filter adjacent matching lines from INPUT (or standard input), writing to OUTPUT (or standard output).
So uniq
groups the numbers. Using the -c
option we get a prefix with the number of occurrences:
$ uniq -c file
3 0
1 1
5 0
1 1
5 0
1 1
2 0
1 1
Then it is a matter of printing those the counters before the 0
. For this we can use awk
like: awk '!$2 {print $1}'
. That is: print the second field if the field is 0
.
Upvotes: 2
Reputation: 63922
More strange (and not fully correct) way:
perl -0x31 -laE 'say @F+0' <file
prints
3
5
5
2
0
It
1
the -0x31
-a
(splits the record into array @F
)@F
e.g. say @F+0
or could use say scalar @F
Unfortunately, after the final 1
(as record separator) it prints an empty record - therefore prints the last 0
.
It is incorrect solution, showing it only as alternative curiosity.
Upvotes: 2
Reputation: 7959
if you can use perl
:
perl -lne 'BEGIN{$counter=0;} if ($_ == 1){ print $counter; $counter=0; next} $counter++' file
3
5
5
2
It actually looks better with awk
same logic:
awk '$1{print c; c=0} !$1{c++}' file
3
5
5
2
Upvotes: 3
Reputation: 46833
A funny one, in pure Bash:
while read -d 1 -a u || ((${#u[@]})); do
echo "${#u[@]}"
done < file
This tells read
to use 1
as a delimiter, i.e., to stop reading as soon as a 1
is encountered; read stores the 0
's in the fields of the array u
. Then we only need to count the number of fields in u
with ${#u[@]}
. The || ((${#u[@]}))
is here just in case your file doesn't end with a 1
.
Upvotes: 2
Reputation: 158010
You can use awk
:
awk '$1=="0"{s++} $1=="1"{if(s)print s;s=0} END{if(s)print(s)}'
Explanation:
The special variable $1
contains the value of the first field (column) of a line of text. Unless you specify the field delimiter using the -F
command line option it defaults to a widespace - meaning $1
will contain 0
or 1
in your example.
If the value of $1
equals 0
a variable called s
will get incremented but if $1
is equal to 1
the current value of s
gets printed (if greater than zero) and re-initialized to 0
. (Note that awk initializes s
with 0
before the first increment operation)
The END
block gets executed after the last line of input has been processed. If the file ends with 0
(s) the number of 0
s between the file's end and the last 1
will get printed. (Without the END
block they wouldn't printed)
Output:
3
5
5
2
Upvotes: 3
Reputation: 269687
I'd include uniq
for a more easily read approach:
uniq -c file.txt | awk '/ 0$/ {print $1}'
Upvotes: 17
Reputation: 289755
Using awk
, I would use the fact that a field with the value 0
evaluates as False:
awk '!$1{s++; next} {if (s) print s; s=0} END {if (s) print s}' file
This returns:
3
5
5
2
Also, note the END
block to print any "remaining" zeroes appearing after the last 1
.
!$1{s++; next}
if the field is not True, that is, if the field is 0
, increment the counter. Then, skip to the next line.{if (s) print s; s=0}
otherwise, print the value of the counter and reset it, but just if it contains some value (to avoid printing 0
if the file starts with a 1
).END {if (s) print s}
print the remaining value of the counter after processing the file, but just if it wasn't printed before.Upvotes: 7
Reputation: 785196
This seems to be pretty popular question today. Joining the party late, here is another short gnu-awk command to do the job:
awk -F '\n' -v RS='(1\n)+' 'NF{print NF-1}' file
3
5
5
2
How it works:
-F '\n' # set input field separator as \n (newline)
-v RS='(1\n)+' # set input record separator as multipled of 1 followed by newline
NF # execute the block if minimum one field is found
print NF-1 # print num of field -1 to get count of 0
Upvotes: 5
Reputation: 4371
If your file.txt is just a column of ones and zeros, you can use awk
and change the record separator to "1\n". This makes each "record" a sequence of "0\n", and the count of 0's in the record is the length of the record divided by 2. Counts will be correct for leading and trailing ones and zeros.
awk 'BEGIN {RS="1\n"} { print length/2 }' file.txt
Upvotes: 5
Reputation: 63922
Another way:
perl -lnE 'if(m/1/){say $.-1;$.=0}' < file
"reset" the line counter when 1
.
prints
3
5
5
2
Upvotes: 3
Reputation: 241741
Edit: fixed for the case where the last line is a 0
Easy in awk:
awk '/1/{print NR-prev-1; prev=NR;}END{if (NR>prev)print NR-prev;}'
Not so difficult in bash, either:
i=0
for x in $(<file.txt); do
if ((x)); then echo $i; i=0; else ((++i)); fi
done
((i)) && echo $i
Upvotes: 10
Reputation: 2503
My attempt. Not so pretty but.. :3
grep -n 1 test.txt | gawk '{y=$1-x; print y-1; x=$1}' FS=":"
Out:
3
5
5
2
Upvotes: 2
Reputation: 241898
Pure bash:
sum=0
while read n ; do
if ((n)) ; then
echo $sum
sum=0
else
((++sum))
fi
done < file.txt
((sum)) && echo $sum # Don't forget to output the last number if the file ended in 0.
Upvotes: 3