Reputation: 1196
I am trying to get a count of unique lines output to a file based on the first field, where the input lines look like:
Forms.js /forms/Forms.js http://www.gumby.com/test.htm 404
Forms.js /forms/Forms1.js http://www.gumby.com/test.htm 404
Forms.js /forms/Forms2.js http://www.gumby.com/test.htm 404
Interpret.js /forms/Interpret1.js http://www.gumby.com/test.htm 404
Interpret.js /forms/Interpret2.js http://www.gumby.com/test.htm 404
Interpret.js /forms/Interpret3.js http://www.gumby.com/test.htm 404
To something like this:
3 Forms.js /forms/Forms.js http://www.gumby.com.mx/test.htm 404
3 Interpret.js /forms/Interpret.js http://www.gumby.com.mx/test.htm 404
I have been trying various combinations of sort and uniq, but haven't hit on it yet. I can get distinct lines using the whole line, but I just want the first field. I am currently using cygwin. I am not awk literate, but I suspect that is the route to go. Anyone have a handy solution?
Upvotes: 2
Views: 1283
Reputation: 203899
$ awk '!c[$1]++{v[$1]=$0} END{for (i in c) print c[i],v[i]}' file
3 Forms.js /forms/Forms.js http://www.gumby.com/test.htm 404
3 Interpret.js /forms/Interpret1.js http://www.gumby.com/test.htm 404
The above uses the common awk idiom of '!array[$n]++' to tell if a key value ($n where n is $0 or $1 or $4,$5 or ...) has been seen before.
Upvotes: 2
Reputation: 47149
This:
<infile awk '{ h[$1]++ } END { for(k in h) print h[k], k }'
Will get you:
3 Forms.js
3 Interpret.js
If you also want to keep the first hit use:
awk '!h[$1] { g[$1]=$0 } { h[$1]++ } END { for(k in g) print h[k], g[k] }'
Output:
3 Forms.js /forms/Forms.js http://www.gumby.com/test.htm 404
3 Interpret.js /forms/Interpret1.js http://www.gumby.com/test.htm 404
Tested with GNU awk.
Note that this does not require input to be sorted. Also note that the results are unordered.
Upvotes: 4
Reputation: 4137
assuming file.txt
contains your sample input:
sort file.txt | awk -f counts.awk file
returns:
3:Forms.js /forms/Forms.js http://www.gumby.com/test.htm 404
3:Interpret.js /forms/Interpret1.js http://www.gumby.com/test.htm 404
awk script file:
cat counts.awk
# output format is:
#+ TimesFirstFieldIsRepeated:FirstMatchingLineContents
BEGIN {
plmatch="";
pline="";
outline="";
n=1;
}
{
if($1 != plmatch && NR != 1)
{
print n ":" outline;
n=1;
outline="";
}
if($1 == plmatch)
{
n+=1;
if(outline == ""){
outline=pline;
}
}
plmatch=$1;
pline=$0;
}
END {
print n ":" outline;
}
Upvotes: 1
Reputation: 85835
Awk
is the tool for this but if you want to be clever with uniq
:
$ column -t file | uniq -w12 -c
3 Forms.js /forms/Forms.js http://www.gumby.com/test.htm 404
3 Interpret.js /forms/Interpret1.js http://www.gumby.com/test.htm 404
column -t
aligns all the columns so we get a fixed width for column one.
Or a hack if column
isn't available is to append the first column to end of the line with awk
and then use uniq -c -f4
to count unique on the last column and use awk
again to print the n-1
fields.
$ awk '{print $0, $1}' file | uniq -c -f4 | awk '{$NF=""; NF--; print}'
3 Forms.js /forms/Forms.js http://www.gumby.com/test.htm 404
3 Interpret.js /forms/Interpret1.js http://www.gumby.com/test.htm 404
It would be nice if uniq -f
worked like -f4,4
or f1,1
.
Or you could use rev
to reverse the file so uniq -c -f3
can be done and then rev
back (you get the count at the end however and if you don't have column
you probably don't have rev
)
$ rev file | uniq -c -f3 | rev
Forms.js /forms/Forms.js http://www.gumby.com/test.htm 404 3
Interpret.js /forms/Interpret1.js http://www.gumby.com/test.htm 404 3
Upvotes: 2
Reputation: 1478
You can count the amount of the first field with cut
but what you want to print after this field ?
cat file | cut -d " " -f 1 | uniq -c
Upvotes: 0
Reputation: 2678
I'd just cut -f 1 | uniq -c
. That won't give you the whole line, but if the lines are differing, printing any line won't make too much sense anyway. Depends on what you want to achieve.
Upvotes: 0