fregas
fregas

Reputation: 3250

Need a regular expression to get rid of parenthesis in html image tag filename

So say I have some html with an image tag like this:

<p> (1) some image is below:
<img src="/somwhere/filename_(1).jpg">
</p>

I want a regex that will just get rid of the parenthesis in the filename so my html will look like this:

<p> (1) some image is below:
<img src="/somwhere/filename_1.jpg">
</p>

Does anyone know how to do this? My programming language is C#, if that makes a difference...

I will be eternally grateful and send some very nice karma your way. :)

Upvotes: 4

Views: 563

Answers (5)

Alan Moore
Alan Moore

Reputation: 75222

Nick's solution is fine if the file names always match that format, but this one matches any parenthesis, anywhere in the attribute:

s = Regex.Replace(@"(?i)(?<=<img\s+[^>]*\bsrc\s*=\s*""[^""]*)[()]", "");

The lookbehind ensures that the match occurs inside the src attribute of an img tag. It assumes the attribute is enclosed in double-quotes (quotation marks); if you need to allow for single-quotes (apostrophes) or no quotes at all, the regex gets much more complicated. I'll post that if you need it.

Upvotes: 1

t0mm13b
t0mm13b

Reputation: 34592

I suspect your job would be much easier if you used the HTML Agility that can help you to do this instead of regex's judging from the answers, it will make parsing the HTML a lot easier for you to achieve what you are trying to do.

Hope this helps, Best regards, Tom.

Upvotes: 1

Jay
Jay

Reputation: 57909

Regex.Replace(some_input, @"(?<=<\s*img\s*src\s*=\s*""[^""]*?)(?:\(|\))(?=[^""]*?""\s*\/?\s*?>)", "");

Finds ( or ) preceded by <img src =" and, optionally, text (with any whitespace combination, though I didn't include newline), and followed by optional text and "> or "/>, again with any whitespace combination, and replaces them with nothingness.

Upvotes: 0

Nick Higgs
Nick Higgs

Reputation: 1702

This (rather dense) regex should do it:

string s = Regex.Replace(input, @"(<img\s+[^>]*src=""[^""]*)\((\d+)\)([^""]*""[^>]*>)", "$1$2$3");

Upvotes: 1

AndiDog
AndiDog

Reputation: 70118

In this simple case, you could just use string.Replace, for example:

string imgFilename = "/somewhere/image_(1).jpg";
imgFilename = imgFilename.Replace("(", "").Replace(")", "");

Or do you need a regex for replacing the complete tag inside a HTML string?

Upvotes: 0

Related Questions