Reputation: 91
I'm trying to match all domains in a string for example:
"hello.test.com"
'hello-to.ya.com'
"test.two.for.com"
Basically matching all characters between single and double quotes that have .com
Here's what I came up with:
\.([a-z0-9-])+\.(com)
I'm testing with this visual site: https://regexr.com/
But it won't match example #3 and I want to search for the outer quotes too. I'm parsing JSON in string format, so I don't want extra stuff.
Example JSON:
'DbiResourceId': 'db-ZDKG55HDKSLJ33',
'DeletionProtection': False,
'DomainMemberships': [],
'Endpoint': {'Address': 'things-dev.dj5fhdk2.us-west-2.rds.amazonaws.com',
'HostedZoneId': 'DKGH32DL4',
'Port': 1234},
Thank you so much!
Upvotes: 0
Views: 1115
Reputation: 58324
Your regex only matches a portion of all 3 of your examples. If you need to match the URL in its entirety, then you want a pattern that matches one or more substrings of the form [a-z0-9-]+\.
followed by the substring com
. That would look like:
([a-z0-9-]+\.)+com
You can play with it on regexr. You can add the outer single quotes if you want to match those:
'([a-z0-9-]+\.)+com'
NOTE: I used your basic collection of characters for these regex expressions to help you get started, and it looks like you're doing "just enough to get by" with your particular application. However, these do not capture all valid URL names, and they do allow some invalid names (e.g., some that begin with
-
). If you want to make this more accurate to the URL standard, you need to have a look at the RFC 3986, section 2. This describes in detail the valid characters allowed in a URL name.
Upvotes: 1
Reputation: 58
I am not entirely sure what you're asking in your question.
If you want:
"hello.test.com"
"hello-to.ya.com"
"test.two.for.com"
to match on
test.com
ya.com
for.com
excluding the sub domains, try:
([a-z0-9-]*.com)
Use the multi line and global flag.
Upvotes: 1