Reputation: 51
I am using this regex to parse URL from a semicolon separated string.
\b(?:https?:|http?:|www\.)\S+\b
It is working fine if my input text is in these formats:
"Google;\"https://google.com\""
//output - https://google.com
"Yahoo;\"www.yahoo.com\""
//output - www.yahoo.com
but in this case it gives incorrect string
"https://google.com;\"https://google.com\""
//output - https://google.com;\"https://google.com
how can I stop the parsing when I encounter the ';' ?
Upvotes: 1
Views: 374
Reputation: 2304
I would personally just modify the regex to look specifically for URLs and add some conditionals to the https:// protocols and www quantifier. Using \S+ can be kind of iffy because it will grab every non whitespace character, in which in a URL, it's limited on the characters you can use.
Something like this should work great for your particular needs.
(https?:\/{2})?([w]{3}.)?\w+\.[a-zA-Z]+
This sets up a conditional on the http
(s
also optional) protocol which would then be immediately be followed by the ://
. Then, it will grab all letters, numbers, and underscores as many as possible until the .
, followed by the last set of characters to end it. You can exchange the [a-zA-Z]
character set for a explicit set of domains if you'd prefer.
Upvotes: 1
Reputation: 163467
For your example data you might use a positive lookahead (?=
) and a positive lookbehind (?<=)
(?<=")(?:https?:|www\.).+?(?=;?\\")
That would match
(?<=")
Positive lookbehind to assert that what is on the left side is a double quote(?:https?:|www\.)
Match either http with an optional s or www..+?
Match any character one or more times non greedy(?=;?\\")
Positive lookahead which asserts that what follows is an optional ;
followed by\"
Upvotes: 1
Reputation: 2991
Looking at your examples, I would just match any URL between quotation marks. Something like this:
(?<=")(?:https?:|www\.)[^"]*
Or as others have said, split the input string by the semicolon character using string.Split
, and check each string sequentially for your desired match.
Upvotes: 1