Joe Th
Joe Th

Reputation: 9

Evaluating a log file using a sh script

I have a log file with a lot of lines with the following format:

IP - - [Timestamp Zone] 'Command Weblink Format' - size

I want to write a script.sh that gives me the number of times each website has been clicked. The command:

awk '{print $7}' server.log | sort -u

should give me a list which puts each unique weblink in a separate line. The command

grep 'Weblink1' server.log | wc -l

should give me the number of times the Weblink1 has been clicked. I want a command that converts each line created by the Awk command above to a variable and then create a loop that runs the grep command on the extracted weblink. I could use

while IFS='' read -r line || [[ -n "$line" ]]; do
    echo "Text read from file: $line"
done

(source: Read a file line by line assigning the value to a variable) but I don't want to save the output of the Awk script in a .txt file.

My guess would be:

while IFS='' read -r line || [[ -n "$line" ]]; do
    grep '$line' server.log | wc -l | ='$variabel' |
    echo " $line was clicked $variable times "
done

But I'm not really familiar with connecting commands in a loop, as this is my first time. Would this loop work and how do I connect my loop and the Awk script?

Upvotes: 0

Views: 56

Answers (1)

dave_thompson_085
dave_thompson_085

Reputation: 38781

Shell commands in a loop connect the same way they do without a loop, and you aren't very close. But yes, this can be done in a loop if you want the horribly inefficient way for some reason such as a learning experience:

awk '{print $7}' server.log |
sort -u |
while IFS= read -r line; do 
  n=$(grep -c "$line" server.log)
  echo "$line" clicked $n times
done 

# you only need the read || [ -n ] idiom if the input can end with an
# unterminated partial line (is illformed); awk print output can't.
# you don't really need the IFS= and -r because the data here is URLs 
# which cannot contain whitespace and shouldn't contain backslash,
# but I left them in as good-habit-forming.

# in general variable expansions should be doublequoted
# to prevent wordsplitting and/or globbing, although in this case 
# $line is a URL which cannot contain whitespace and practically 
# cannot be a glob. $n is a number and definitely safe.

# grep -c does the count so you don't need wc -l

or more simply

awk '{print $7}' server.log |
sort -u |
while IFS= read -r line; do 
  echo "$line" clicked $(grep -c "$line" server.log) times
done 

However if you just want the correct results, it is much more efficient and somewhat simpler to do it in one pass in awk:

awk '{n[$7]++}
    END{for(i in n){
        print i,"clicked",n[i],"times"}}' |
sort

# or GNU awk 4+ can do the sort itself, see the doc:
awk '{n[$7]++}
    END{PROCINFO["sorted_in"]="@ind_str_asc";
    for(i in n){
        print i,"clicked",n[i],"times"}}'

The associative array n collects the values from the seventh field as keys, and on each line, the value for the extracted key is incremented. Thus, at the end, the keys in n are all the URLs in the file, and the value for each is the number of times it occurred.

Upvotes: 1

Related Questions