Cooler
Cooler

Reputation: 55

Grep specific domain and all subdomains from access.log

I'm trying to grep a specific line with domain from Apache2 access.log. In my access.log I have all my virtual hosts and different domains.

cat/var/log/access.log:

www.something-else-domain.si:80 193.77.xxx. xxx - - [06/Nov/2013:12:21:45 +0100] "GET /path/to/dir/image.jpg HTTP/1.1" 304 - "www.something-else-domain.si/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0"

www.domain.si:80 193.77.xxx. xxx - - [06/Nov/2013:12:21:45 +0100] "GET /path/to/dir/image. jpg HTTP/1.1" 304 - "www.domain.si/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0"

domain.si:80 193.77.xxx. xxx - - [06/Nov/2013:12:21:45 +0100] "GET /path/to/dir/image. jpg HTTP/1.1" 304 - "www.domain.si/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0"

I would want to grep only the domain.si and www.domain.si and whatever.domain.si and not something-else-domain.si. How could I do that? Thanks for help.

Upvotes: 0

Views: 1553

Answers (2)

Thomas
Thomas

Reputation: 182063

egrep '^([^ ]*\.)?domain\.si' /var/log/access.log

Taking this apart:

  • ^ is the beginning of the line.
  • (xxx)? is "match xxx or nothing"; in this case, match either:
    • nothing at all, which is the case of a naked domain name (domain.si)
    • [^ ]*\., any string of characters that are not spaces, followed by a dot. This matches the optional www. or whatever. part.
  • domain\.si simply matches the domain.si part.

The anchoring with ^, along with the "no spaces" bit, ensures that you only match things at the beginning of the line (not requests like GET /domain.si).

Upvotes: 2

Jotne
Jotne

Reputation: 41460

A gnu awk solution

awk  '/www.domain$|domanin$/ {print $NF RS}' RS=".si"
www.domain.si
"www.domain.si
"www.domain.si

There is a problem in your example. space are not allowed in url

Upvotes: 0

Related Questions