Reputation: 2195
I'm trying to parse a string that contains multiple YouTube links, but no white-space between them. The links can also start with "http" or "https". Example string:
https://www.youtube.com/watch?v=abc123http://www.youtube.com/watch?v=abc123https://www.youtube.com/watch?v=abc123
So 3 links in there. I have no control over that string at all, as it comes from a chat service that people are posting links into and then it's my job to regex the URL's out and record them.
Here's the Regex I've come up with so far:
(https?\:\/\/)?(www\.)?(youtube\.com|youtu\.?be)\/\S+
I'm not sure how to make it break when it sees a pattern further down the string though. Can anyone help?
Upvotes: 1
Views: 200
Reputation: 1171
If you added a separator of some kind, like a "|"
, when you created the string, then you could easily split on that.
var videoUrls = input.Split("|");
Upvotes: 0
Reputation: 31616
Here is a Regex which will split out the joined links. Note that I have renamed the links to 111, 222 and 333 for easier debugging:
string data = "https://www.youtube.com/watch?v=abc111http://www.youtube.com/watch?v=abc222https://www.youtube.com/watch?v=abc333";
string pattern = @"(?<YouTubeLink>https?.+?)(?=http|$)";
Regex.Matches(data, pattern)
.OfType<Match>()
.Select (mt => mt.Groups["YouTubeLink"].Value);
/* The above results in an IEnumerable of these strings:
https://www.youtube.com/watch?v=abc111
http://www.youtube.com/watch?v=abc222
https://www.youtube.com/watch?v=abc333
*/
Explanation:
(?< > )
: Named match capture for easier post regex processing data extraction.s?
: To capture http and thes
is optional?
for https.+?
: capture as minimal as possible(?= )
: Look ahead, to stop the.+?
from grabbing more text.http|$
: look ahead to stop on a new http or end of the data.
Upvotes: 3
Reputation: 28107
You can just split on http
and then add it back:
var input = "https://www.youtube.com/watch?v=abc123http://www.youtube.com/watch?v=abc123https://www.youtube.com/watch?v=abc123";
var split = input.Split("http");
var urls = split.Select(x => "http" + x);
This of course assumes that "http" doesn't appear anywhere else in the urls...
Upvotes: 4