Reputation: 2027
My regex is poor and letting me down so some help would be great here.
All I want to do is return all the links which appear in a tweet (just a string) - Some examples are:
"Great summary http://mytest.com/blog/post.html (#test)
"http://mytest.com/blog/post.html (#test)
"post: http://mytest.com/blog/post.html"
It should also support multiple links like:
"read http://mytest.com/blog/post.html and http://mytest.com/blog/post_two.html"
Any help would be great!
Thanks
Ben
Upvotes: 3
Views: 1283
Reputation: 18305
Found this one here
^(?#Protocol)(?:(?:ht|f)tp(?:s?)\:\/\/|~/|/)?(?#Username:Password)(?:\w+:\w+@)?(?#Subdomains)(?:(?:[-\w]+\.)+(?#TopLevel Domains)(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2}))(?#Port)(?::[\d]{1,5})?(?#Directories)(?:(?:(?:/(?:[-\w~!$+|.,=]|%[a-f\d]{2})+)+|/)+|\?|#)?(?#Query)(?:(?:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?#Anchor)(?:#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)?$
Upvotes: 1
Reputation: 648
I realize this question is from 2009, but Twitter's API now returns URLs (and expands t.co links).
Upvotes: 0
Reputation:
Here's a code snippet from a site I wrote that parses a twitter feed. It parses links, hash tags, and twitter usernames. So far it's worked fine. I know it's not Ruby, but the regex should be helpful.
if(tweetStream[i] != null)
{
var str = tweetStream[i].Text;
var re = new Regex(@"http(s)?:\/\/\S+");
MatchCollection mc = re.Matches(tweetStream[i].Text);
foreach (Match m in mc)
{
str = str.Replace(m.Value, "<a href='" + m.Value + "' target='_blank'>" + m.Value + "</a>");
}
re = new Regex(@"(@)(\w+)");
mc = re.Matches(tweetStream[i].Text);
foreach (Match m in mc)
{
str = str.Replace(m.Value, "<a href='http://twitter.com/" + m.Value.Replace("@",string.Empty) + "' target='_blank'>" + m.Value + "</a>");
}
re = new Regex(@"(#)(\w+)");
mc = re.Matches(tweetStream[i].Text);
foreach (Match m in mc)
{
str = str.Replace(m.Value, "<a href='http://twitter.com/#search?q=" + m.Value.Replace("#", "%23") + "' target='_blank'>" + m.Value + "</a>");
}
tweets += string1 + "<div>" + str + "</div>" + string2;
}
Upvotes: 1
Reputation: 162781
Try this one:
/\bhttps?:\/\/\S+\b/
Update:
To catch links beginning with "www." too (no "http://" prefix), you could try this:
/\b(?:https?:\/\/|www\.)\S+\b/
Upvotes: 2