Reputation: 11247
I want to extract the Twitter status URL from a text URL inside a post and use than use that URL to get the embed code from Twitter using their API. I have a problem with the URL extraction using JavaScript and Regex.
The Regex ignores URLs that are within single or double quotes so it won't render the code inside a hyperlink. I need to convert that URL to the embed HTML code I get from the Twitter API.
Javascript code (original Regex code from this question but modified to ignore text that starts with single or double quotes):
var str = '<p>https://twitter.com/oppomobileindia/status/798397636780953600</p>';
var matched = str.match(/^[^'"]*http(s)?:\/\/twitter\.com\/(?:#!\/)?(\w+)\/status(es)?\/(\d+)$/);
var url = matched[0];
<p>
is just for an example of html text that can surround the URL, this can be any string, but if the URL is within quotation, it should be ignored.
I have two problems that I couldn't solve out.
1) In the matched[0], I get also the characters before the URL. How can I get only the URL?
2) How to replace the URL in the source string with another arbitrary text (Will eventually be the widget HTML code that I get from Twitter)?
Results expected:
1) var url should be: "https://twitter.com/oppomobileindia/status/798397636780953600"
2) var str should be: "<p>this is a a arbitrary text that replaced the original url</p>"
(or any other text in that matter)
Upvotes: 2
Views: 2367
Reputation: 626845
To get the URL value, you can add a capturing group around the URL pattern:
/(^|[^'"])(https?:\/\/twitter\.com\/(?:#!\/)?(\w+)\/status(?:es)?\/(\d+))/
and access [2]
, capturing group #2.
Regex details:
(^|[^'"])
- Capturing group #1: either start of string (^
) or any char other than "
and '
([^'"]
)(https?:\/\/twitter\.com\/(?:#!\/)?(\w+)\/status(?:es)?\/(\d+))
- Capturing group 2:
https?:\/\/twitter\.com\/
- a literal https://twitter.com/
or http://twitter.com/
text(?:#!\/)?
- an optional (1 or 0 occurrence) sequence of #!/
(\w+)
- Capturing group #3: one or more letters/digits or _
\/status(?:es)?\/
- literal /status/
or /statuses/
text(\d+)
- Capturing group #4: one or more digits.To replace just the URL, you just need to use capturing groups and backreferences to restore the text inside the capturing groups you need to keep:
var replaced = str.replace(/(^|[^'"])(https?:\/\/twitter\.com\/(?:#!\/)?(\w+)\/status(?:es)?\/(\d+))/, '$1NEW_CODE');
See JS demo:
var str = '<p>https://twitter.com/oppomobileindia/status/798397636780953600</p>';
var matched = str.match(/(^|[^'"])(https?:\/\/twitter\.com\/(?:#!\/)?(\w+)\/status(?:es)?\/(\d+))/);
var url = matched[2];
console.log(url);
var res = str.replace(/(^|[^'"])(https?:\/\/twitter\.com\/(?:#!\/)?(\w+)\/status(?:es)?\/(\d+))/, '$1NEW_CODE');
console.log(res);
Upvotes: 4