John
John

Reputation: 5249

How can I ignore a bad xmlns namespace with Perl's LibXML?

I have an XML document that references a namespace that is no available:

<microplateDoc xmlns="http://moleculardevices.com/microplateML">
...my data is here...
</microplateDoc>

I have a script that reads it fine, but only when I delete the two above tags, otherwise it reads it all screwed up. Is it ok just to ignore it? I'm thinking of a writing another script to go through all of my input files and deleting these two lines, but I think there may be a better way?

If I did go through all my datafiles and deleted these two lines, what is the best way to do it with a script? I presume just open each file, search for those terms, delete them, save file, can you think of a better way? thanks.

Upvotes: 2

Views: 3316

Answers (5)

Gayathri Jayakumar
Gayathri Jayakumar

Reputation: 9

You can search the node with an XPath like //*[name()="microplateDoc"]. Hope this works. Thanks.

Upvotes: 0

Scott
Scott

Reputation: 11

So what you are indicating is that the XML::LibXML module is not properly parsing your xml file/content when a namespace is not properly prefixed in the xml document? A work around is to dynamically remove the namespace. You can do something such as the following:

$xml =~ s/xmlns\=([\S]+)//m;

This should remove everything starting with xmlns=""

Upvotes: 1

hcayless
hcayless

Reputation: 1046

I don't think there's anything wrong with your namespace there, and I wouldn't go messing with the input files unless you're confident there won't be any unwelcome side-effects. What I think it happening is a common beginner XML-processing mistake: namespaces need to be registered (i.e. bound to a prefix) in your code before you can access the nodes in that namespace.

http://perl-xml.sourceforge.net/faq/#namespaces_xpath looks like a useful example. I don't generally work with Perl, but I've seen this happen in a bunch of other languages.

Upvotes: 2

Robert Rossney
Robert Rossney

Reputation: 96860

I have an XML document that references a namespace that is no available:

I suspect you're confused about what an XML namespace is. A namespace is a Uniform Resource Identifier, which is to say a string of characters that conforms to RFC 3305. It's not (necessarily) a Uniform Resource Locator, though it can be, as URLs are all URIs.

The important thing is: Just because an XML namespace begins with http:// doesn't mean that the XML parser is going to look it up. It won't (unless the person who wrote it doesn't understand what namespaces are, in which case you're going to have a lot more problems than this).

It's impossible to tell what you mean when you say that the script reading this XML document "reads it all screwed up." Is it OK to ignore it? It may very well be. Part of the purpose of namespaces, after all, is to make it possible to embed information in an XML document that some consumers of that document can ignore.

On the other hand, if you're not the only one who uses those files, you could be making big trouble for yourself by deleting data that someone else needs.

Upvotes: 4

Ether
Ether

Reputation: 53996

Regarding removing lines from a file, this exact question was asked earlier today. (Just add -d to the sed options to delete the matching line.)

Upvotes: 1

Related Questions