Reputation: 43
I am trying to do some html scraping with JavaScript, and would like to take the a href
link and replace it into a hyperlink on a Discord embed. I am having trouble with regex, I am finding it very difficult to learn.
I assume I will also need another regex to capture it all so I can replace it with my desired target?
This is an example raw html that I have:
An **example**, also known as a <a href="https://www.example.com/example%20type">example type</a>
to make this readable within a Discord embed, I am looking for a desired output of:
An **example**, also known as a [**example type**](https://www.example.com/example%20type)
I have tried extracting the URL via regex, which I can match however, I am having issues with extracting the link and the (I think its called target? The 'example type' in the example link text) and then replacing the string with my desired output. I have the following: (https://regexr.com/73574)
/href="[^"]+/g
This matches href="https://www.example.com/example%20type
, and feels like a very early step, it includes 'href' in the match, and it does not capture the target.
EDIT: I apologise, I did not think about additional checks, what if the string has multiple links? and text after them, for example:
An **example**, also known as a <a href="https://www.example.com/example%20type">example type</a> is the first example, and now I have <a href="https://www.example.com/second">second</a> example
with a desired output of:
An **example**, also known as a [**example type**](https://www.example.com/example%20type) is the first example, and now I have [**second**](https://www.example.com/second) example
Upvotes: 1
Views: 1710
Reputation: 43
Solution:
const input = 'An **example**, also known as a <a href="https://www.example.com/example%20type">example type</a> first and second here <a href="https://www.example.com/no%20u">no u</a> and then done noice';
const output = input.replace(/<a href="([^"]+)">([^<]+)<\/a>/g, '[**$2**]($1)')
console.log(output);
Regex breakdown:
<a href="
- Matches the opening <a href"
HTML tag([^"]+)
- This is a capturing group, matches a number of characters that are not double quotes">
- Matches the closing double quotes, including the closing tag '>'([^<]+)
- Another capturing group, matches a number of characters that are not a less than symbol<\/a>
- Matches the closing HTML tagI then use the replace
method seen in my output
variable.
Within the replace, you see two options (regex, replaceWith)
The first option is obvious, its the regex. The second option [**$2**]($1)
, uses the capturing groups we see in the regex, the first group $1
provides the link within the HTML tag, and $2
provides the HTML target (the name after the link, for example in my input
variable, the first target you see is: 'example type'.
The only important bits in this option is: $2
and $1
, however I wanted to display them in a certain way, [**target**](link)
.
Upvotes: 2
Reputation: 401
You can use regular expression groups to capture things that interest you. My regular expression here might be far from perfect but I don't think that's important here - it shows you a way and you can always improve it if needed.
Things you have to do:
Here's a quick code example of that:
const anchorRegex = /(<a\shref="([^"]+)">(.+?)<\/a>)/i;
const textToBeParsed = `An **example**, also known as a <a href="https://www.example.com/example%20type">example type</a>`;
const parseText = (text) => {
const matches = anchorRegex.exec(textToBeParsed);
if (!matches) {
console.warn("Something went wrong...");
return;
}
const [, fullAnchorTag, anchorUrl, anchorText] = matches;
const textWithoutAnchorTag = text.replace(fullAnchorTag, '');
return `${textWithoutAnchorTag}[**${anchorText}**](${anchorUrl})`;
};
console.log(parseText(textToBeParsed));
Upvotes: 1
Reputation: 587
Try this: (?<=href=")[^"]*
By using a lookbehind, you can now verify that the text behind is equal to href="
without capturing it
Demo: https://regex101.com/r/2qMnPt/1
Upvotes: 1