Reputation: 43639
My text is something like:
<a href="http://example.com/test this now">Stuff</a>
More stuff
<a href="http://example.com/more?stuff goes here">more</a>
I want to replace what's inside the href
with a function that will URL Encode just the URL portion.
How would I go about this?
UPDATE Here's what I've tried:
postdata.comment.content = postdata.comment.content.replace(/href=\"(.+?)\"/g, function(match, p1) {
return encodeURI(p1);
});
Does not do what I would have hoped.
Expected result is:
<a href="http%3A%2F%2Fexample.com%2Ftest%20this%20now">Stuff</a>
More stuff
<a href="http%3A%2F%2Fexample.com%2Fmore%3Fstuff%20goes%20here">more</a>
Upvotes: 4
Views: 7230
Reputation: 178413
Where is this running? If you have a DOM, then you are MUCH better off using a DOM loop over document.links or document.querySelectorAll("a") than regex on HTML. Also you likely do not want to encode EVERYTHING, only the search part
var allLinks = document.querySelectorAll("a");
for (var i=0;i<allLinks.length;i++) {
var search = allLinks[i].search;
if (search) {
allLinks[i].search="?"+search.substring(1).replace(/stuff/,encodeURIComponent("something"));
}
}
In case you really DO want to have encoded hrefs then
for (var i=0;i<allLinks.length;i++) {
var href = allLinks[i].href;
if (href) {
allLinks[i].href=href.replace(/stuff/,encodeURIComponent("something"));
}
}
Upvotes: 4
Reputation:
Disclaimer: Don't use regex to parse HTML
(too many reasons to list here..)
But, if you insist, this might work.
Find /(<[\w:]+(?:[^>"']|"[^"]*"|'[^']*')*?\shref\s*=\s*)(?:(['"])([\S\s]*?)\2)((?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>)/
Replace $1$2
+ someEncoding( $3 ) + $2$4
Expanded
( # (1 start)
< [\w:]+
(?: [^>"'] | " [^"]* " | ' [^']* ' )*?
\s
href \s* = \s*
) # (1 end)
(?:
( ['"] ) # (2)
( # (3 start)
[\S\s]*?
) # (3 end)
\2
)
( # (4 start)
(?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
>
) # (4 end)
Upvotes: 4
Reputation: 288660
For the encoding, you can use encodeURIComponent
:
var links = document.querySelectorAll('a');
for(var i=0; i<links.length; ++i)
links[i].href = encodeURIComponent(links[i].href);
<a href="http://example.com/test this now">Stuff</a>
More stuff
<a href="http://example.com/more?stuff goes here">more</a>
If you only have a HTML string instead of DOM elements, then use don't use regular expressions. Parse your string with a DOM parser instead.
var codeString = '<a href="http://example.com/test this now">Stuff</a>\nMore stuff\n<a href="http://example.com/more?stuff goes here">more</a>';
var doc = new DOMParser().parseFromString(codeString, "text/html");
var links = doc.querySelectorAll('a');
for(var i=0; i<links.length; ++i)
links[i].href = encodeURIComponent(links[i].href);
document.querySelector('code').textContent = doc.body.innerHTML;
<pre><code></code></pre>
And note that if you encode the URL entirely, it will be treated as a relative URL.
Upvotes: 6
Reputation: 92894
Your expected string "http%3A%2F%2Fexample.com%2Ftest%20this%20now"
corresponds to this operation encodeURIComponent("http://example.com/test this now")
, but not with encodeURI
function:
var str = '<a href="http://example.com/test this now">Stuff</a>More stuff<a href="http://example.com/more?stuff goes here">more</a>';
str = str.replace(/href=\"(.+?)\"/g, function (m, p1) {
return encodeURIComponent(p1);
});
console.log(str);
// <a http%3A%2F%2Fexample.com%2Ftest%20this%20now>Stuff</a>More stuff<a http%3A%2F%2Fexample.com%2Fmore%3Fstuff%20goes%20here>more</a>
Upvotes: 2
Reputation: 87233
The regex is matching the complete attribute href="...."
, however, the replacement is only done by the encoded URL and use encodeURIComponent()
to encode complete URL.
var string = '<a href="http://example.com/test this now">Stuff</a>';
string = string.replace(/href="(.*?)"/, function(m, $1) {
return 'href="' + encodeURIComponent($1) + '"';
// ^^^^^^ ^
});
var str = `<a href="http://example.com/test this now">Stuff</a>
More stuff
<a href="http://example.com/more?stuff goes here">more</a>`;
str = str.replace(/href="(.*?)"/g, (m, $1) => 'href="' + encodeURIComponent($1) + '"');
console.log(str);
document.body.textContent = str;
Upvotes: 8