Mathieu
Mathieu

Reputation: 4797

Ruby regexp - get facebook video id from different urls with a unique regexp

I would like to extract video ids from potentially different URLs

https://www.facebook.com/{page-name}/videos/{video-id}/
https://www.facebook.com/{username}/videos/{video-id}/
https://www.facebook.com/video.php?id={video-id}
https://www.facebook.com/video.php?v={video-id}

How can I retrieve the video ids with a single ruby regex?

I haven't managed to convert this to Ruby regex but I (partially) managed to write it in standard JS regex:

^(https?://www\.facebook\.com/(?:video\.php\?v=\d+|.*?/videos/\d+))$

When I run the following code in Ruby it gives me an error:

text = "https://www.facebook.com/pili.morillo.56/videos/352355988613922/"
id = text.gsub( ^(https?://www\.facebook\.com/(?:video\.php\?v=\d+|.*?/videos/\d+))$ )

Upvotes: 0

Views: 222

Answers (3)

The fourth bird
The fourth bird

Reputation: 163577

You might use:

^https?:\/\/www\.facebook\.com\/.*?video(?:s|\.php.*?[?&](?:id|v)=)\/?([^\/&\n]+).*$

That would match

Begin of the string and begin url

^https?:\/\/www\.facebook\.com\/

Followed by:

.*?          # Match any character zero or more times
video        # Match video
(?:          # Non capturing group
  s          # Match s
  |          # Or
  \.php      # Match .php
  .*?        # Match any character zero or more times         
  [?&]       # Match ? or &
  (?:id|v)=  # Match id or v in non capturing group followed by =
)            # Close non capturing group
\/?          # Match optional /
(            # Capturing group (group 1)
  [^\/&\n]+  # Match not / or & or newline
)            # Close capturing group
.*           # Match any character zero or more times
$            # End of the string
text = "https://www.facebook.com/pili.morillo.56/videos/352355988613922/"
id = text.gsub(/^https?:\/\/www\.facebook\.com\/.*?video(?:s|\.php.*?[?&](?:id|v)=)\/?([^\/&\n]+).*$/, "\\1")
puts id

That will result in: 352355988613922

Demo

Upvotes: 0

Aleksei Matiushkin
Aleksei Matiushkin

Reputation: 121010

RE = %r[https://www.facebook.com/(?:.+?/)?video(?:.*?[/=])(.+?)(?:/?\z)]
%w[
  https://www.facebook.com/{page-name}/videos/{video-id}/
  https://www.facebook.com/{username}/videos/{video-id}/
  https://www.facebook.com/video.php?id={video-id}
  https://www.facebook.com/video.php?v={video-id}
].map { |url| url[RE, 1] }
#⇒ ["{video-id}", "{video-id}", "{video-id}", "{video-id}"]

Upvotes: 0

scagood
scagood

Reputation: 782

Here is the regexp I came up with: /(?<=\/videos\/)\d+?(?=\/|$)|(?<=[?&]id=)\d+?(?=&|$)|(?<=[?&]v=)\d+?(?=&|$)/

Breaking this up we can get this:

(?<=\/videos\/)\d+(?=\/|$)|
(?<=[?&]id=)\d+(?=&|$)|
(?<=[?&]v=)\d+(?=&|$)

Each of the three options follow the following simple structure: (?<=beforeMatch)target(?=afterMatch). Here is the first as an example:

(?<=\/videos\/) # Positive lookbehind
\d+             # Matching the digits
(?=\/|$)        # Positive lookahead

So, this means, match \d+ any digit, as long as it's preceeded by \/videos\/ and followed by \/ or it's the end of the line.

Therefore, we can match by 'id=', 'v=' or 'videos/'.

The full explaination:

(?<=\/videos\/) # Match as long as preceeded by '\/videos\/'
\d+             # Matching the id digits
(?=\/|$)        # As long as it's followed by '\/' or the EOL
|             # Or
(?<=[?&]id=)    # Match as long as preceeded by '?id' or  '&id'
\d+             #  Matching the id digits
(?=&|$)         # As long as it's followed by either '&' or the EOL
|             # Or
(?<=[?&]v=)     # Match as long as preceeded by '?v' or  '&v'
\d+             # Matching the id digits
(?=&|$)         # As long as it's followed by either '&' or the EOL

Where 'EOL' means end of line.

Upvotes: 1

Related Questions