Reputation: 4845
I have an output of the following format in bash, from a script I wrote that returns the number of duplicate file names and the file name itself within a particular directory.
19 prob561493
19 prob564972
19 prob561564
11 prob561965
8 prob562172
7 prob564449
6 prob564155
6 prob562925
6 prob562739
Using output | head -n1
, I can get the first entry of the above output to get 19 prob561493
. However, I also want to print out other problems that share the same number of max duplicates, so in this case, the final output should look this way:
19 prob561493
19 prob564972
19 prob561564
I tried to do a cut -d" " | uniq -c
to first get the integer of the output and then only show the unique results, but that returned ALL the duplicate results.
How can I print only the duplicated maximum duplication lines?
Upvotes: 0
Views: 291
Reputation: 46856
You asked how to do this in bash. I have to say that awk may provide the clearest method to achieve what you want:
awk 'NR==1{n=$1} $1==n{print;next} {exit}'
This gets the count from the first field, then prints each line with that first field, and exits when the field doesn't match. It assumes sorted input.
But the task can still be handled in bash (or even just shell) alone, without spawning extra commands or subshells.
#!/bin/sh
n=0
while read count data; do
printf "%3d %s\n" "$count" "$data"
if [ $n -gt 1 -a "$count" != "$lastcount" ]; then
break
fi
n=$((n+1))
done
There are zillions of ways you can achieve this.
Upvotes: 1
Reputation: 53535
Use awk to extract the '19' and grep+regex to get the lines that start with 19\b
. Assuming your file-name is "output":
grep -E "$(head -n1 output | awk '{print $1}')\b" output
Upvotes: 0
Reputation: 785316
You can use this awk
:
awk 'NR==FNR{if ($1>max) max=$1; next} $1==max' file file
19 prob561493
19 prob564972
19 prob561564
In the 1st pass we are getting max value from $1
stored in variable max
and in 2nd pass we just print all the records that have first field same as max
.
Upvotes: 0
Reputation: 3094
Assuming the file is sorted numerically on the first column you can use awk
for this in the following way
awk 'NR==1 {max=$1} {if($1==max){print $0}}'
this grabs the first field of the first line and stores that in variable max
and only the lines that match this number are printed subsequently
Upvotes: 1
Reputation: 14609
You may first retrieve the number of max occurence, and then grep on that file:
NB=$(head -n1 error.dat | cut -d ' ' -f 1)
egrep ^$NB error.dat
Here egrep
means that grep
should interpret the pattern as a regex; and ^
represents the beginning of a line
Upvotes: 0