user1047260
user1047260

Reputation: 91

Regex for matching domains (.com)

I'm trying to match all domains in a string for example:

"hello.test.com"
'hello-to.ya.com'
"test.two.for.com"

Basically matching all characters between single and double quotes that have .com

Here's what I came up with: \.([a-z0-9-])+\.(com)

I'm testing with this visual site: https://regexr.com/

But it won't match example #3 and I want to search for the outer quotes too. I'm parsing JSON in string format, so I don't want extra stuff.

Example JSON:

'DbiResourceId': 'db-ZDKG55HDKSLJ33',
                  'DeletionProtection': False,
                  'DomainMemberships': [],
                  'Endpoint': {'Address': 'things-dev.dj5fhdk2.us-west-2.rds.amazonaws.com',
                               'HostedZoneId': 'DKGH32DL4',
                               'Port': 1234},

Thank you so much!

Upvotes: 0

Views: 1115

Answers (2)

lurker
lurker

Reputation: 58324

Your regex only matches a portion of all 3 of your examples. If you need to match the URL in its entirety, then you want a pattern that matches one or more substrings of the form [a-z0-9-]+\. followed by the substring com. That would look like:

([a-z0-9-]+\.)+com

You can play with it on regexr. You can add the outer single quotes if you want to match those:

'([a-z0-9-]+\.)+com'

NOTE: I used your basic collection of characters for these regex expressions to help you get started, and it looks like you're doing "just enough to get by" with your particular application. However, these do not capture all valid URL names, and they do allow some invalid names (e.g., some that begin with -). If you want to make this more accurate to the URL standard, you need to have a look at the RFC 3986, section 2. This describes in detail the valid characters allowed in a URL name.

Upvotes: 1

Jeff
Jeff

Reputation: 58

I am not entirely sure what you're asking in your question.

If you want:

"hello.test.com"
"hello-to.ya.com"
"test.two.for.com"

to match on

test.com
ya.com
for.com

excluding the sub domains, try:

([a-z0-9-]*.com)

Use the multi line and global flag.

Upvotes: 1

Related Questions