Reputation: 73
I want to write a script in bash that prints the least repeating line of standard input
I wrote this code:
#!/bin/bash
var=1000
while read line
do
tmp=$(grep -c $line)
if [ $tmp -lt $var ]
then
var=$tmp
out=$line
fi
done
var="$var $out"
echo $var
but e.g. when using a test like this
id1
id2
id3
id1
square
id1
id2
id3
id1
circle
id2
id2
the program only enters the loop once thus it gives a bad output
3 id1
when the correct one should be
1 square
This line
tmp=$(grep -c $line)
seems to be breaking the loop but I can't find out why. Is there any way to bypass using grep in my code or any other way to fix my script?
Upvotes: 1
Views: 1459
Reputation: 189357
The grep
command reads the remainder of standard input. You will need to copy the input to a temp file if you want to both grep
it and do something else with it.
A much simpler solution to your problem is
uniq -d | tail -n 1
More generally, running grep
on each line in a loop over a file is at antipattern which often suggests moving to Awk or sed
instead, if you can't find a simple pipeline with standard tools to accomplish your goal.
Upvotes: 0
Reputation: 2691
The problem in your code is that this grep
tmp=$(grep -c $line)
will read from stdin and thus consume all the lines on the very first round the while loop is executed. I.e. first you will read
the first line into $line
. Then you will grep
for this string in the rest of the stdin.
You could fix your code by using a temporary file, e.g.:
#!/bin/bash
tmpfile=$(mktemp)
cat > "$tmpfile"
min=0
while IFS= read -r line; do
count=$(grep -c "$line" $tmpfile)
if (( min == 0 || (count < min) )); then
min=$count
out="$min $line"
fi
done < <(sort -u "$tmpfile")
rm "$tmpfile"
echo "$out"
But this is of course quite horrible solution as it uses temporary file and opens the input file many times. Better would be to use something like:
#!/bin/bash
sort | uniq -c | sort -n | head -1
Upvotes: 2