Grep specific domain and all subdomains from access.log

Question

I'm trying to grep a specific line with domain from Apache2 access.log. In my access.log I have all my virtual hosts and different domains.

cat/var/log/access.log:

www.something-else-domain.si:80 193.77.xxx. xxx - - [06/Nov/2013:12:21:45 +0100] "GET /path/to/dir/image.jpg HTTP/1.1" 304 - "www.something-else-domain.si/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0"

www.domain.si:80 193.77.xxx. xxx - - [06/Nov/2013:12:21:45 +0100] "GET /path/to/dir/image. jpg HTTP/1.1" 304 - "www.domain.si/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0"

domain.si:80 193.77.xxx. xxx - - [06/Nov/2013:12:21:45 +0100] "GET /path/to/dir/image. jpg HTTP/1.1" 304 - "www.domain.si/index.php" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0"

I would want to grep only the domain.si and www.domain.si and whatever.domain.si and not something-else-domain.si. How could I do that? Thanks for help.

Thomas · Accepted Answer

egrep '^([^ ]*\.)?domain\.si' /var/log/access.log

Taking this apart:

^ is the beginning of the line.
(xxx)? is "match xxx or nothing"; in this case, match either:
- nothing at all, which is the case of a naked domain name (domain.si)
- [^ ]*\., any string of characters that are not spaces, followed by a dot. This matches the optional www. or whatever. part.
domain\.si simply matches the domain.si part.

The anchoring with ^, along with the "no spaces" bit, ensures that you only match things at the beginning of the line (not requests like GET /domain.si).

Grep specific domain and all subdomains from access.log

Answers (2)

Related Questions