Liron Harel
Liron Harel

Reputation: 11247

Extract Twitter status URL using Regex and convert to another string using Javascript

I want to extract the Twitter status URL from a text URL inside a post and use than use that URL to get the embed code from Twitter using their API. I have a problem with the URL extraction using JavaScript and Regex.

The Regex ignores URLs that are within single or double quotes so it won't render the code inside a hyperlink. I need to convert that URL to the embed HTML code I get from the Twitter API.

Javascript code (original Regex code from this question but modified to ignore text that starts with single or double quotes):

var str = '<p>https://twitter.com/oppomobileindia/status/798397636780953600</p>';
var matched = str.match(/^[^'"]*http(s)?:\/\/twitter\.com\/(?:#!\/)?(\w+)\/status(es)?\/(\d+)$/);

var url = matched[0];

<p> is just for an example of html text that can surround the URL, this can be any string, but if the URL is within quotation, it should be ignored.

I have two problems that I couldn't solve out.

1) In the matched[0], I get also the characters before the URL. How can I get only the URL?

2) How to replace the URL in the source string with another arbitrary text (Will eventually be the widget HTML code that I get from Twitter)?

Results expected:

1) var url should be: "https://twitter.com/oppomobileindia/status/798397636780953600"

2) var str should be: "<p>this is a a arbitrary text that replaced the original url</p>" (or any other text in that matter)

Upvotes: 2

Views: 2367

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

To get the URL value, you can add a capturing group around the URL pattern:

/(^|[^'"])(https?:\/\/twitter\.com\/(?:#!\/)?(\w+)‌​\/status(?:es)?\/(\d‌​+))/

and access [2], capturing group #2.

Regex details:

  • (^|[^'"]) - Capturing group #1: either start of string (^) or any char other than " and ' ([^'"])
  • (https?:\/\/twitter\.com\/(?:#!\/)?(\w+)‌​\/status(?:es)?\/(\d‌​+)) - Capturing group 2:
    • https?:\/\/twitter\.com\/ - a literal https://twitter.com/ or http://twitter.com/ text
    • (?:#!\/)? - an optional (1 or 0 occurrence) sequence of #!/
    • (\w+)‌ - Capturing group #3: one or more letters/digits or _
    • ​\/status(?:es)?\/ - literal /status/ or /statuses/ text
    • (\d‌​+) - Capturing group #4: one or more digits.

To replace just the URL, you just need to use capturing groups and backreferences to restore the text inside the capturing groups you need to keep:

var replaced = str.replace(/(^|[^'"])(https?:\/\/twitter\.com\/(?:#!\/)?(\w‌​+)\/status(?:es)?\/(‌​\d+))/, '$1NEW_CODE');

See JS demo:

var str = '<p>https://twitter.com/oppomobileindia/status/798397636780953600</p>';
var matched = str.match(/(^|[^'"])(https?:\/\/twitter\.com\/(?:#!\/)?(\w+)\/status(?:es)?\/(\d+))/);
var url = matched[2];
console.log(url);
var res = str.replace(/(^|[^'"])(https?:\/\/twitter\.com\/(?:#!\/)?(\w+)\/status(?:es)?\/(\d+))/, '$1NEW_CODE');
console.log(res);

Upvotes: 4

Related Questions