collimarco
collimarco

Reputation: 35450

Clear Unwanted Namespaces with LibXML-Ruby

I would like to parse an Atom Feed and create an Atom-compliant cache of each Entry.

The problem is that some feeds (this one for example) have many namespaces other than the Atom one.

Is it possible to keep intact all Atom nodes and remove each node that belongs to another namespace?

Something like this:

valid_nodes = entry.find('atom:*', '/atom:feed/atom:entry')
# now I need to create an xml string with valid_nodes, but how I do that?

Upvotes: 1

Views: 408

Answers (1)

Tomalak
Tomalak

Reputation: 338316

In XSLT you could use this transformation:

<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns="http://www.w3.org/2005/Atom"
>
  <xsl:output method="xml" indent="yes" encoding="utf-8" />

  <xsl:template match="node() | @*">
    <xsl:if test="
      namespace-uri() = ''
      or
      namespace-uri() = 'http://www.w3.org/2005/Atom'
    ">
      <xsl:copy>
        <xsl:apply-templates select="node() | @*" />
      </xsl:copy>
    </xsl:if>
  </xsl:template>

  <xsl:template match="text()|comment()">
    <xsl:copy-of select="." />
  </xsl:template>
</xsl:stylesheet>

This copies all nodes verbatim, if they are

  • in the default (empty) namespace
  • in the Atom namespace
  • text nodes or comments

Maybe you can use that.

Upvotes: 2

Related Questions