Reputation: 167
I have some legacy XML documents stored in a database as a blob, which are not well formed XML. I'm reading them in from a SQL database, and ultimately, as I am using C#.NET, would like to instantiate them as an XMLDocument.
When I try to do this, I obviously get an XMLException. Having looked at the XML documents, they are all failing because of undeclared namespaces in specific XML Nodes.
I am not concerned with any of the XML nodes which have this prefix, so I can ignore them or throw them away. So basically, before I load the string as an XMLDocument, I would like to remove the prefix in the string, so that
<tem:GetRouteID>
<tem:PostCode>postcode</tem:PostCode>
<tem:Type>ItemType</tem:Type>
</tem:GetRouteID>
becomes
<GetRouteID>
<PostCode>postcode</PostCode>
<Type>ItemType</Type>
</GetRouteID>
and this
<wsse:Security soapenv:actor="">
<wsse:BinarySecurityToken>token</wsse:BinarySecurityToken>
</wsse:Security>
becomes this :
<Security soapenv:actor="">
<BinarySecurityToken>token</BinarySecurityToken>
</Security>
I have one solution which does this like so :
<appSettings>
<add key="STRIP_NAMESPACES" value="wsse;tem" />
</appSettings>
if (STRIP_NAMESPACES != null)
{
string[] namespaces = Regex.Split(STRIP_NAMESPACES, ";");
foreach (string ns in namespaces)
{
str2 = str2.Replace("<" + ns + ":", "<"); // Replace opening tag
str2 = str2.Replace("</" + ns + ":", "</"); // Replace closing tag
}
}
but Ideally I would like a generic approach for this, so I don't have to endlessly configure the namespaces I want to remove.
How can I achieve this in C#.NET. I am assuming that a Regex is the way to go here?
UPDATE 1
Ria's Regex below works well for the requirement above. However, how would I need to change the Regex to also change this
<wsse:Security soapenv:actor="">
<BinarySecurityToken>authtoken</BinarySecurityToken>
</Security>
to this?
<Security>
<BinarySecurityToken>authtoken</BinarySecurityToken>
</Security>
UPDATE 2
Think I've worked out the updated version myself based on Ria's answer like so :
<(/?)\w+:(\w+/?) ?(\w+:\w+.*)?>
Upvotes: 6
Views: 9860
Reputation: 10347
UPDATE
For new issue (attribs namespace) try this general solution. this has no effect on node values:
Regex.Replace(originalXml,
@"((?<=</?)\w+:(?<elem>\w+)|\w+:(?<elem>\w+)(?==\"))",
"${elem}");
try this regex on my sample xml:
<wsse:Security soapenv:actor="dont match soapenv:actor attrib">
<BinarySecurityToken>authtoken</BinarySecurityToken>
</Security>
Try using XSL
, You can apply XSL
directly or using XslTransform
class in .NET:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="no"/>
<xsl:template match="/|comment()|processing-instruction()">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="*">
<xsl:element name="{local-name()}">
<xsl:apply-templates select="@*|node()"/>
</xsl:element>
</xsl:template>
<xsl:template match="@*">
<xsl:attribute name="{local-name()}">
<xsl:value-of select="."/>
</xsl:attribute>
</xsl:template>
</xsl:stylesheet>
or try this Regex
:
var finalXml = Regex.Replace(originalXml, @"<(/?)\w+:(\w+/?)>", "<$1$2>");
Upvotes: 9