Reputation: 7768
I have a large string and it might have the following:
<div id="Specs" class="plinks">
<div id="Specs" class="plinks2">
<div id="Specs" class="sdfsf">
<div id="Specs" class="ANY-OTHER_NAME">
How can I replace values in the string from anything above to:
<div id="Specs" class="">
this is what I came up with, but it does not work:
string source = "bunch of text";
string regex = "<div id=\"Specs\" class=[\"']([^\"']*)[\"']>";
string regexReplaceTo = "<div id=\"Specs\" class=\"\">";
string output = Regex.Replace(source, regex, regexReplaceTo);
Upvotes: 0
Views: 995
Reputation: 114631
If your input isn't XML compliant, which most HTML isn't, then you can use the HTML Agility Pack to parse the HTML and manipulate the contents. With the HTML Agility PAck, combined with Linq or Xpath, the order of your attributes no longer matters (which it does when you use Regex) and the overall stability of your solution increases a lot.
Using the HTML Agility Pack (project page, nuget), this does the trick:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("your html here");
// or doc.Load(stream);
var nodes = doc.DocumentNode.DescendantNodes("div").Where(div => div.Id == "Specs");
foreach (var node in nodes)
{
var classAttribute = node.Attributes["class"];
if (classAttribute != null)
{
classAttribute.Value = string.Empty;
}
}
var fixedText = doc.DocumentNode.OuterHtml;
//doc.Save(/* stream */);
Upvotes: 2
Reputation: 54628
Looks like another case of http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html. What happens to the following valid tags with a Regex
?
<div class="reversed" id="Specs">
<div id="Specs" class="additionalSpaces" >
<div id="Specs" class="additionalAttributes" style="" >
I don't see a how using Linq2Xml wouldn't work with any combination:
XElement root = XElement.Parse(xml); // XDocument.Load(xmlFile).Root
var specsDivs = root.Descendants()
.Where(e => e.Name == "div"
&& e.Attributes.Any(a => a.Name == "id")
&& e.Attributes.First(a => a.Name == "id").Value == "Specs"
&& e.Attributes.Any(a => a.Name == "class"));
foreach(var div in specsDivs)
{
div.Attributes.First(a => a.Name == "class").value = string.Empty;
}
string newXml = root.ToString()
Upvotes: 4
Reputation: 22810
What about...
class=\"[A-Za-z0-9_\-]+\"
class=\"\"
This way, we ignore the first part (id="Specs"
, etc) and
just replace the class name... with nothing.
Upvotes: 4