evan
evan

Reputation: 61

Extract URLs from String (Ruby) (Regex and link shortened)

I heard that URI::extract() only returns links with a :, however since I am grabbing a tweet, and it does not contain a :, I believe I would have to use a regular expression. I need to check for a "swoo.sh/whatever" link, and store it to a variable. However, how could I look for the first (which it returns automatically apparently), "swoo.sh/whatever" link, in regards to that I have to maintain everything after the /. For example, if the tweet says

Lorem ipsum lorem ipsum swoo.sh/12xfsW lorem ipsum

How would I grab the swoo.sh link, and all the different things that come directly after the /?

Upvotes: 1

Views: 685

Answers (2)

Max
Max

Reputation: 22315

We can use the fact that URIs can't contain spaces and Ruby has URI::Generic which will parse almost anything that looks URI-ish. Then we just need to filter out non-web-URIs, which I do by assuming that every web URI has to start with something like foo.bar

require 'uri'
require 'pathname'

tweet.
  split.
  map { |s| URI.parse(s) rescue nil }.
  select { |u| u && (u.hostname || Pathname(u.path).each_filename.first =~ /\w\.\w/) }

Example output

tweet = 'foo . < google.com bar swoosh.sh/blah?q=bar http://google.com/bar'
# the above returns
# [#<URI::Generic google.com>, #<URI::Generic swoosh.sh/blah?q=bar>, #<URI::HTTP http://google.com/bar>]

This can't really work in general because of ambiguity. "car.net" looks like a shortened link, but in context it could be "my neighbor threw a baseball through my window so i yanked the hubcabs off his car.net gain!!!", where it's clearly just a missing space.

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520878

Here is one approach using match:

match = /(\w+\.\w+\/\w+)/.match("Lorem ipsum lorem ipsum swoo.sh/12xfsW lorem ipsum")
if match
    puts match[1]
else
    puts "no match"
end

Demo

If you also need the simultaneous ability to capture full URLs, then my answer would have to be updated. This only answers your immediate question.

Upvotes: 1

Related Questions