Reputation: 4797
I would like to extract video ids from potentially different URLs
https://www.facebook.com/{page-name}/videos/{video-id}/
https://www.facebook.com/{username}/videos/{video-id}/
https://www.facebook.com/video.php?id={video-id}
https://www.facebook.com/video.php?v={video-id}
How can I retrieve the video ids with a single ruby regex?
I haven't managed to convert this to Ruby regex but I (partially) managed to write it in standard JS regex:
^(https?://www\.facebook\.com/(?:video\.php\?v=\d+|.*?/videos/\d+))$
When I run the following code in Ruby it gives me an error:
text = "https://www.facebook.com/pili.morillo.56/videos/352355988613922/"
id = text.gsub( ^(https?://www\.facebook\.com/(?:video\.php\?v=\d+|.*?/videos/\d+))$ )
Upvotes: 0
Views: 222
Reputation: 163577
You might use:
^https?:\/\/www\.facebook\.com\/.*?video(?:s|\.php.*?[?&](?:id|v)=)\/?([^\/&\n]+).*$
That would match
Begin of the string and begin url
^https?:\/\/www\.facebook\.com\/
Followed by:
.*? # Match any character zero or more times video # Match video (?: # Non capturing group s # Match s | # Or \.php # Match .php .*? # Match any character zero or more times [?&] # Match ? or & (?:id|v)= # Match id or v in non capturing group followed by = ) # Close non capturing group \/? # Match optional / ( # Capturing group (group 1) [^\/&\n]+ # Match not / or & or newline ) # Close capturing group .* # Match any character zero or more times $ # End of the string
text = "https://www.facebook.com/pili.morillo.56/videos/352355988613922/"
id = text.gsub(/^https?:\/\/www\.facebook\.com\/.*?video(?:s|\.php.*?[?&](?:id|v)=)\/?([^\/&\n]+).*$/, "\\1")
puts id
That will result in: 352355988613922
Upvotes: 0
Reputation: 121010
RE = %r[https://www.facebook.com/(?:.+?/)?video(?:.*?[/=])(.+?)(?:/?\z)]
%w[
https://www.facebook.com/{page-name}/videos/{video-id}/
https://www.facebook.com/{username}/videos/{video-id}/
https://www.facebook.com/video.php?id={video-id}
https://www.facebook.com/video.php?v={video-id}
].map { |url| url[RE, 1] }
#⇒ ["{video-id}", "{video-id}", "{video-id}", "{video-id}"]
Upvotes: 0
Reputation: 782
Here is the regexp I came up with: /(?<=\/videos\/)\d+?(?=\/|$)|(?<=[?&]id=)\d+?(?=&|$)|(?<=[?&]v=)\d+?(?=&|$)/
Breaking this up we can get this:
(?<=\/videos\/)\d+(?=\/|$)|
(?<=[?&]id=)\d+(?=&|$)|
(?<=[?&]v=)\d+(?=&|$)
Each of the three options follow the following simple structure: (?<=beforeMatch)target(?=afterMatch)
.
Here is the first as an example:
(?<=\/videos\/) # Positive lookbehind
\d+ # Matching the digits
(?=\/|$) # Positive lookahead
So, this means, match \d+
any digit, as long as it's preceeded by \/videos\/
and followed by \/
or it's the end of the line.
Therefore, we can match by 'id=', 'v=' or 'videos/'.
The full explaination:
(?<=\/videos\/) # Match as long as preceeded by '\/videos\/'
\d+ # Matching the id digits
(?=\/|$) # As long as it's followed by '\/' or the EOL
| # Or
(?<=[?&]id=) # Match as long as preceeded by '?id' or '&id'
\d+ # Matching the id digits
(?=&|$) # As long as it's followed by either '&' or the EOL
| # Or
(?<=[?&]v=) # Match as long as preceeded by '?v' or '&v'
\d+ # Matching the id digits
(?=&|$) # As long as it's followed by either '&' or the EOL
Where 'EOL' means end of line.
Upvotes: 1