Reputation: 1023
I'm trying to write a regex that will look for the width and height attributes in a string (which will always be an html iframe) and replace the values that it has.
What I have is a string where ### could be any value, and not necessarily always 3 digits.
string iFrame = <iframe width="###" height="###" src="http://www.youtube.com/embed/xxxxxx" frameborder="0" allowfullscreen></iframe>
I want to end up with set values for the width and height:
<iframe width="315" height="215" src="http://www.youtube.com/embed/xxxxxx" frameborder="0" allowfullscreen></iframe>
I tried this, but am not good with regular expressions:
iFrame = Regex.Replace(iFrame, "width=\".*\"", "width=\"315\"");
iFrame = Regex.Replace(iFrame, "height=\".*\"", "height=\"215\"");
which resulted in:
<iframe width="315" allowfullscreen></iframe>
which is not what I want. Can someone help me?
Upvotes: 1
Views: 4270
Reputation: 6249
Replace your patterns to this:
"width=\"([0-9]{1,4})\""
and
"height=\"([0-9]{1,4})\""
Basically, you were using .
which performs a greedy-capture. Meaning it grabs as many characters as possible. The patterns above look for any number character [0-9]
that repeats between 1 and 4 times {1,4}
. Which is what you are really looking for.
Upvotes: 9
Reputation: 86
I agree that this isn't the best way to work with html. The problem with your example is the . in you regex which is matching all chars and spaces up to the last " in the string. Change it to the code below which only matches non-whitespace characters.
iFrame = Regex.Replace(iFrame, @"width=""[^\s]*""", "width=\"315\"");
iFrame = Regex.Replace(iFrame, @"height=""[^\s]*""", "height=\"215\"");
Upvotes: 3
Reputation: 499212
You are better off using the HTML Agility Pack to parse and query HTML. It handles HTML fragments well.
RegEx is not a good solution for parsing HTML, as this SO answer may convince you.
Upvotes: 3