Awk, print only patterns that match the regular expression

 10.1.2.194 (197.84.211.148) - - [08/Oct/2015:09:01:44 +0000] "GET /merlin-web-za/web/images/refinements/loader.gif HTTP/1.1" 200 4178 0 1868 "http://www.autotrader.co.za/makemodel/make/chevrolet/model/aveo/caryearrangeszar/2012/search?sort=PriceAsc&locationName=Cape%20Town&latitude=-33.92584&longitude=18.42322&county=Western%20Cape" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" "ajp://10.1.4.66:8009"

I need to modify that in:

08/Oct/2015:09:01:44 GET /merlin-web-za/web/images/refinements/loader

How can i do it using awk or egrep? - i tried commands below , But the result of first command shows full strings which contains both follow patterns

awk ' /08/Oct/2015:09:[0-9]{2}:[0-9]{1,2}/ && /GET (/[a-z0-9-]{1,}){1,3}/'

and

cat file | egrep -o "08/Oct/2015:09:[0-9]{2}:[0-9]{1,}.* GET (/[a-z0-9-]{1,}){1,}"

that fills the gaps between aforementioned patterns and as result i can see:

08/Oct/2015:09:01:44 +0000] "GET /merlin-web-za/web/images/refinements/loader

that is not exactly what i want to get

Upvotes: 2

Views: 1123

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626903

You may use

awk '{a=$5" "$7" "$8; gsub(/[]["]|\.[^.]*$/, "", a); print a}'

See the online demo

Details

The default field separator - whitespace - is used to split the line into fields.

  • a=$5" "$7" "$8; - creates a variable by joining Field 5, 7 and 8 with a space
  • gsub(/[]["]|\.[^.]*$/, "", a) - removes [, ] and " and . + any 0+ chars other than . at the end of the string
  • print a - prints the result.

However, the file you sent me contains comma+space separated IP addresses inside the first parentheses. You may use

sed -E -n 's/^[^][]*\[([^][[:space:]]+)[^][]*\][ \t]+"([[:alpha:]]+[ \t]+[^[:space:]]+).*/\1 \2/p' access_log > newfile

to get the results you want, namely, time + Get/post +URL.

Details

  • ^ - matches start of string
  • [^][]* - any 0 or more chars other than [ and ]
  • \[ - a [ char
  • ([^][[:space:]]+) - Group 1: 1+ chars other than ], [ and whitespace
  • [^][]* - any 0 or more chars other than [ and ]
  • \] - a ] char
  • [ \t]+ - 1+ horizontal whitespace chars
  • " - a " char
  • ([[:alpha:]]+[ \t]+[^[:space:]]+) - Group 2: 1+ letters, 1+ horizontal whitespaces and then 1+ chars other than whitespace
  • .* - the rest of the string.

The result is the concatenation of Group 1 and 2 values.

Upvotes: 1

Related Questions