Reputation: 41

awk group-by on a sub-string of a column

I have the following log file:

/veratt/po/dashboard.do 
/veratt/po/dashboardfilter.do?view=R
/veratt/po/leaseagent.do?view=R
/veratt/po/dashboardfilter.do?&=R&=E&propcode=0&display=0&rateType=0&floorplan=&=Display&format=4&action=getReport
/veratt/po/leaseagent.do
/veratt/po/leaseagent.do?view=V

Desired AWK output of Count of each of the HTTP request (minus the request parameters)**

/veratt/po/dashboard.do  - 1
/veratt/po/leaseagent.do - 3
/veratt/po//veratt/po/dashboardfilter.do  - 2

I know basic AWK command using an array - but the desired output is quite different from what I need.

awk  '{a[$2]=a[$2]+1;} END {for( item in a) print item , a[item];} '

Upvotes: 0

Answers (2)

Kaz

Reputation: 58588

awk -F\? '{ count[$1]++} 
          END { for (item in count)
                  printf("%s - %d\n", item, count[item]) }' logfile

-F: separate fields on ? character, so $1 is the request; it there are URL parameters they are in $2, whose existence we ignore. Note: could be done using BEGIN { FS="?" }. Note: if FS is more than one character, it is treated as a regex.
{ count[$1]++ }: for each line, tally up the occurrence count of $1.
END: run this block at the end of processing all the inputs
for (item in count): iterate the item variable over the keys in the count array.
printf("%s - %d\n", item, count[item]): formatted printing of the item and its count, separated by a dash with spaces. Note: %d can be replaced by %s; awk is weakly typed.

Upvotes: 1

Haifeng Zhang

Reputation: 31903

test.txt

/veratt/po/dashboard.do
/veratt/po/dashboardfilter.do?view=R
/veratt/po/leaseagent.do?view=R
/veratt/po/dashboardfilter.do?&=R&=E&propcode=0&display=0&rateType=0&floorplan=&=Display&format=4&action=getReport
/veratt/po/leaseagent.do
/veratt/po/leaseagent.do?view=V

command:

awk 'BEGIN{FS="?"} {a[$1]++} END{for(i in a) print i, a[i]}' test.txt

output:

/veratt/po/leaseagent.do 3
/veratt/po/dashboard.do 1
/veratt/po/dashboardfilter.do 2

explain:

BEGIN{FS="?"} set ? to be the field separator, so $1 will be the substring before the first ?. This only run once before process contents of test.txt

{a[$1]++} create an array, index is the substring, make it auto-increment.

END{for(i in a) print i, a[i]} iterate the array, checks its index and corresponding value, the END block runs once after all lines of the test.txt processed.

Upvotes: 0

awk group-by on a sub-string of a column

Answers (2)

Related Questions