Validating psuedo xml file using Perl

Question

I have a file which has xml like tags and a bunch of invalid xml data because of which I cannot use a normal xml validators like xmllint on the the file. I want to ignore the invalid xml data and just check the file for well formedness.



5 

   bunch of text which also contains tags like   
   more tags like <->     & ; 
   some more text and numbers

In the above example can I just ignore tags like , <->, &, ; etc and just check for valid opening and closing tags like and . The above file should return back saying its well formed since all the valid tags have proper opening and closing brackets.

Can I create my own dtd/xsd ?? to look for the tags which I want and ignore rest of tags using Perl.

My main problem is that I dont know the right keywords to describe my problem which is why google is not giving me the right results. Can someone please push me in the right direction. Thanks

zostay · Accepted Answer

You'll have to clean up the input first. Once you do that, then you can do DTD, schemas, proper parsing, and whatever.

If it's just the OUTPUT tag, you can try this:

s/()/$1)/]]>$1/;

After that is done, your input should be ready for XML parsing, validation, etc. If your input might contain CDATA sections, you'll have to do more, but that should be enough to get started.

Validating psuedo xml file using Perl

Answers (2)

Related Questions