Reputation: 223
I'm using nokogiri to parse an XML file. Some of the nodes in the file have attributes specific to namespaces:
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:identifier id="iden" opf:scheme="ISBN">xxxx</dc:identifier>
<dc:creator opf:role="aut" opf:file-as="Name">xxxx</dc:creator>
<dc:date opf:event="publication">xxxx</dc:date>
<dc:publisher>xxxx</dc:publisher>
<meta name="cover" content="x"/>
</metadata>
I'm trying to remove any attribute with the "opf" prefix. I've come across xpath solutions in finding an attribute value based on a partial match, but what about when it's a partial match of the attribute name itself? I tried a lot of things that haven't worked. I did a simple thing just to try to extract the attribute names at least, but if I do:
elements = @doc.at_xpath('//xmlns:metadata').children
elements.each { |el|
el.attributes.each { |attribute|
if attribute[1].namespace_scopes[1].prefix == "opf"
puts attribute[0]
end
}
}
I end up getting:
id
scheme
role
file-as
event
name
content
but I only want the ones with the "opf" prefix ("opf:scheme", "opf:role, "opf:file-as", "opf:event") so that they can be removed, without touching any of the other attributes. I even tried to force it by hard-coding the attributes I knew existed:
opf_attributes = ["opf:file-as","opf:scheme","opf:role","opf:event"]
elements.each { |el|
opf_attributes.each { |x|
el.remove_attribute(x) if el[x] != nil
}
}
which is not the smartest way to go about this, but this still didn't work. Nothing happens to the nodes, and the attributes remain as they were. (I don't know if it's worth noting, but if I use the remove_attr(x)
method instead, I get this error: undefined method 'remove_attr' for #<Nokogiri::XML::Element:0x...>
So, my question is:
Is there a clearer way to
Upvotes: 2
Views: 1345
Reputation: 198436
I believe this is much simpler:
doc.xpath('//@opf:*', { opf: "http://www.idpf.org/2007/opf" }).each(&:remove)
//
searches any descendant node, @
indicates it has to be an attribute node, opf:
in conjunction with the namespace definition ({ opf: "http://www.idpf.org/2007/opf" }
) says what namespace it has to belong to, and *
matches any name.
Note that opf:
by itself doesn't mean anything; "http://www.idpf.org/2007/opf"
does, and opf
is just a shorthand in its scope. .xpath('//@foobar:*', { foobar: "http://www.idpf.org/2007/opf" })
would work just as well for your case.
Since you have the namespace definition on the root, and it doesn't change within the document, you can simplify to
doc.xpath('//@opf:*', doc.namespaces).each(&:remove)
but note that this is not generally safe (e.g. the namespace could be defined on a subnode). doc.collect_namespaces
is a bit safer instead, but even then you are not completely safe (e.g. if the same prefix is used for two different URIs in different parts of the document). I'd go with the first one (explicit URI) unless I actually saw the XML with my eyes and know where and how the prefix is defined and used.
tl;dr: Prefixes mean nothing, refer to the associated URI instead.
Upvotes: 1
Reputation: 23949
Node objects have a remove
method that drops them from the tree, so you can write something like this:
require 'nokogiri'
doc = Nokogiri::XML(DATA)
puts '--- Before'
puts doc.to_s
doc.traverse do |node|
next unless node.respond_to? :attributes
node.attributes.each do |key, val|
val.remove if val&.namespace&.prefix == 'opf'
end
end
puts
puts '--- After'
puts doc.to_s
__END__
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:identifier id="iden" opf:scheme="ISBN">xxxx</dc:identifier>
<dc:creator opf:role="aut" opf:file-as="Name">xxxx</dc:creator>
<dc:date opf:event="publication">xxxx</dc:date>
<dc:publisher>xxxx</dc:publisher>
<meta name="cover" content="x"/>
</metadata>
And see the following output:
➜ ~ ruby test.rb
--- Before
<?xml version="1.0"?>
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:identifier id="iden" opf:scheme="ISBN">xxxx</dc:identifier>
<dc:creator opf:role="aut" opf:file-as="Name">xxxx</dc:creator>
<dc:date opf:event="publication">xxxx</dc:date>
<dc:publisher>xxxx</dc:publisher>
<meta name="cover" content="x"/>
</metadata>
--- After
<?xml version="1.0"?>
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:identifier id="iden">xxxx</dc:identifier>
<dc:creator>xxxx</dc:creator>
<dc:date>xxxx</dc:date>
<dc:publisher>xxxx</dc:publisher>
<meta name="cover" content="x"/>
</metadata>
Note If the Ruby version you are using doesn't support &.
you'll need to handle the namespace being potentially nil
.
Upvotes: 1