How do i improve my regex to grep third level domain but not extra character at last?

Question

This regex greps everything. How can i grep only domain but not extra chars.

echo "AAAA  cccc.google.com BBBB" | grep -oE "[^\.
]*((\.[^\.
]*){2}$)"  --color=always

I want cccc.google.com to be grepped but not AAAA cccc.google.com BBBB. Adding \b doesnt work.
echo "AAAA cccc.google.com BBBB" | grep -oE "\b[^\. ]*((\.[^\. ]*){2}\b$)\b" --color=always

Edit: I forgot to say, i needed for grepping third level and fourth level domains. Here's what i meant:

g.google.com This is a third level domain
a.b.google.com This is a 4th level domain.

My above regex was grepping third level domain but it grepped some other char so i asked question. Lets say i have AAAA a.b.c.d.e.g.google.com BBBB then {3} should give me g.google.com and {4} or {3,4} should give me e.g.google.com while at the same time omitting some unwanted character. My regex does exactly that but there is extra character!

So, using this regex(from answer, modified):
echo "AAAA d.cccc.google.com BBB" | grep -oE '\w+(\.\w+){2}'
omits the .com part which my regex doesnt(but it prints exta char :( ). So, could you please modify to work in this case.

Chase · Accepted Answer

It looks like OP wants an interactive regex (clarified in the comments), that can extract n number of domains where the n is variable.

Something like this should work- (?:\w+(?:\.|\b)){4}(?=\.\w+(?: |$))\.\w+

Check out the demo

Usage

With `{2}`

$ echo "AAAA  a.b.c.d.e.g.google.com BBB" | grep -oP "(?:\w+(?:\.|\b)){2}(?=\.\w+(?: |$))\.\w+"
g.google.com

Captures the 2 subdomains, excluding top level domain (i.e com)

With `{3}`

$ echo "AAAA  a.b.c.d.e.g.google.com BBB" | grep -oP "(?:\w+(?:\.|\b)){3}(?=\.\w+(?: |$))\.\w+"
e.g.google.com

Captures the 3 subdomains, excluding top level domain(i.e com)

...and so on

Explanation

(?:\w+(?:\.|\b)){3} <- This is the same as my original answers, it just captures word characters followed by a ., exactly 3 times

(?=\.\w+(?: |$))\.\w+ <- This acts as the stopping point of the previous regex. It marks the start of the top level domain and captures it.

Original Answer

That regex seems completely wrong, if you want to only match urls like cccc.google.com and www.google.com but not google.com, you should use- (?:\w+(?:\.|\b)){3}

Check out the demo

Explanation

The primary part is \w+(?:\.|\b) - this matches word characters that are immediately followed by a . or a word boundary (i.e space)

This is enclosed with a (?:){3} which makes sure such groups are encountered 3 times.

To also grep 4th level domains, use just change the {3} to {3,4}

(?:\w+(?:\.|\b)){3,4}

Check out the demo

This is how you should do it with grep-

$ echo "AAAA  cccc.google.com BBB" | grep -oP "(?:\w+(?:\.|\b)){3,4}"
cccc.google.com

And with d.cccc.google.com

$ echo "AAAA  d.cccc.google.com BBB" | grep -oP "(?:\w+(?:\.|\b)){3,4}"
d.cccc.google.com

How do i improve my regex to grep third level domain but not extra character at last?

Answers (2)

Usage

With `{2}`

With `{3}`

Explanation

Original Answer

Explanation

Related Questions

How do i improve my regex to grep third level domain but not extra character at last?

Answers (2)

Usage

With {2}

With {3}

Explanation

Original Answer

Explanation

Related Questions

With `{2}`

With `{3}`