Martin Blore
Martin Blore

Reputation: 2195

Regex for multiple web links with no whitespace?

I'm trying to parse a string that contains multiple YouTube links, but no white-space between them. The links can also start with "http" or "https". Example string:

https://www.youtube.com/watch?v=abc123http://www.youtube.com/watch?v=abc123https://www.youtube.com/watch?v=abc123

So 3 links in there. I have no control over that string at all, as it comes from a chat service that people are posting links into and then it's my job to regex the URL's out and record them.

Here's the Regex I've come up with so far:

(https?\:\/\/)?(www\.)?(youtube\.com|youtu\.?be)\/\S+

I'm not sure how to make it break when it sees a pattern further down the string though. Can anyone help?

Upvotes: 1

Views: 200

Answers (3)

Matt Clark
Matt Clark

Reputation: 1171

If you added a separator of some kind, like a "|", when you created the string, then you could easily split on that.

var videoUrls = input.Split("|");

Upvotes: 0

ΩmegaMan
ΩmegaMan

Reputation: 31616

Here is a Regex which will split out the joined links. Note that I have renamed the links to 111, 222 and 333 for easier debugging:

string data = "https://www.youtube.com/watch?v=abc111http://www.youtube.com/watch?v=abc222https://www.youtube.com/watch?v=abc333";

string pattern = @"(?<YouTubeLink>https?.+?)(?=http|$)";

Regex.Matches(data, pattern)
     .OfType<Match>()
     .Select (mt => mt.Groups["YouTubeLink"].Value);

/* The above results in an IEnumerable of these strings:
https://www.youtube.com/watch?v=abc111
http://www.youtube.com/watch?v=abc222
https://www.youtube.com/watch?v=abc333
*/

Explanation:

  • (?< > ) : Named match capture for easier post regex processing data extraction.
  • s? : To capture http and the s is optional ? for https
  • .+? : capture as minimal as possible
  • (?= ) : Look ahead, to stop the .+? from grabbing more text.
  • http|$ : look ahead to stop on a new http or end of the data.

Upvotes: 3

dav_i
dav_i

Reputation: 28107

You can just split on http and then add it back:

var input = "https://www.youtube.com/watch?v=abc123http://www.youtube.com/watch?v=abc123https://www.youtube.com/watch?v=abc123";

var split = input.Split("http");

var urls = split.Select(x => "http" + x);

This of course assumes that "http" doesn't appear anywhere else in the urls...

Upvotes: 4

Related Questions