Reputation: 637
I'm trying to write a regex in python that that will either match a URL (for example https://www.foo.com/) or a domain that starts with "sc-domain:" but doesn't not have https or a path.
For example, the below entries should pass
https://www.foo.com/
https://www.foo.com/bar/
sc-domain:www.foo.com
However the below entries should fail
htps://www.foo.com/
https:/www.foo.com/bar/
sc-domain:www.foo.com/
sc-domain:www.foo.com/bar
scdomain:www.foo.com
Right now I'm working with the below:
^(https://*/|sc-domain:^[^/]*$)
This almost works, but still allows submissions like sc-domain:www.foo.com/ to go through. Specifically, the ^[^/]*$
part doesn't capture that a '/' should not pass.
Upvotes: 1
Views: 105
Reputation: 27723
This expression also would do that using two simple capturing groups that you can modify as you wish:
^((http|https)(:\/\/www.foo.com)(\/.*))|(sc-domain:www.foo.com)$
I have also added http, which you can remove it if it may be undesired.
const regex = /^(((http|https)(:\/\/www.foo.com)(\/.*))|(sc-domain:www.foo.com))$/gm;
const str = `https://www.foo.com/
https://www.foo.com/bar/
sc-domain:www.foo.com
http://www.foo.com/
http://www.foo.com/bar/
`;
const subst = `$1`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
You can simply test with Python and add the capturing groups that are desired:
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"^((http|https)(:\/\/www.foo.com)(\/.*))|(sc-domain:www.foo.com)$"
test_str = ("https://www.foo.com/\n"
"https://www.foo.com/bar/\n"
"sc-domain:www.foo.com\n"
"http://www.foo.com/\n"
"http://www.foo.com/bar/\n\n"
"htps://www.foo.com/\n"
"https:/www.foo.com/bar/\n"
"sc-domain:www.foo.com/\n"
"sc-domain:www.foo.com/bar\n"
"scdomain:www.foo.com")
subst = "$1 $2"
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
if result:
print (result)
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
Based on Pushpesh's advice, you can use lookaround and simplify it to:
^((https?)(:\/\/www.foo.com)(\/.*))|(sc-domain:www.foo.com)$
Upvotes: 1
Reputation: 18357
You can use this regex,
^(?:https?://www\.foo\.com(?:/\S*)*|sc-domain:www\.foo\.com)$
Explanation:
^
- Start of line(?:
- Start of non-group for alternationhttps?://www\.foo\.com(?:/\S*)*
- This matches a URL starting with http:// or https:// followed by www.foo.com and further optionally followed by path using|
- alternation for strings starting with sc-domain:sc-domain:www\.foo\.com
- This part starts matching with sc-domain: followed by www.foo.com and further does not allow any file path)$
- Close of non-grouping pattern and end of string.Also, a little not sure whether you wanted to allow any random domain, but in case you want to allow, you can use this regex,
^(?:https?://(?:\w+\.)+\w+(?:/\S*)*|sc-domain:(?:\w+\.)+\w+)$
Regex Demo allowing any domain
Upvotes: 1
Reputation: 67968
^((?:https://\S+)|(?:sc-domain:[^/\s]+))$
You can try this.
See demo.
https://regex101.com/r/xXSayK/2
Upvotes: 4