Reputation: 21
I have a very weird problem: I run the same code on the two xml files, the second of which is the copy of the first one (I copied the contents, maybe that's a problem). The code uses REXML to parse the xml file, on the first file it's all good, on the second I have this error: Failed: malformed XML: missing tag start Line: 2 Position: 102 Last 80 unconsumed characters:
<t>dede</t>
The contents of the xml file is:
<?xml version="1.0" standalone="yes"?>
<t>dede</t>
Any ideas?
Thanks a lot
Upvotes: 2
Views: 3898
Reputation: 85
It's because of the file encoding. I have the same problem and found out the file was UCS-2 encoded. Either UTF-8 or ANSI works, but UCS-2 doesn't, it seems. It probably needs specialized parsers for this format first. I just converted the xml file in Notepad++ to test the different encodings.
Upvotes: 2
Reputation: 4439
REXML seems a bit too eager to throw a ParseException. Encoding is definitely a major culprit. Check the encoding of your files.
Upvotes: 0
Reputation: 303381
I do not have any such problem using this code:
require 'rexml/document'
doc = REXML::Document.new <<ENDXML
<?xml version="1.0" standalone="yes"?>
<t>dede</t>
ENDXML
doc.each_element('//t'){ |e| puts e }
#=> <t>dede</t>
What version of Ruby are you using, and what does your code actually look like?
Edit: Based off the new information that you're using the stream parser, here's another piece of code that also works for me using Ruby 1.8.7:
class Listener
def method_missing( name, *args ); puts "I don't support '#{name}'"; end
def tag_start( name, attrs ); puts "<#{name} #{attrs.inspect}>"; end
def text( str ); p str; end
def tag_end( name ); puts "</#{name}>"; end
end
require 'stringio'
xml = StringIO.new <<ENDXML
<?xml version="1.0" standalone="yes"?>
<t>dede</t>
ENDXML
require 'rexml/document'
doc = REXML::Document.parse_stream( xml, Listener.new )
#=> "\t"
#=> I don't support 'xmldecl'
#=> "\n\t"
#=> <t {}>
#=> "dede"
#=> </t>
#=> "\n"
Upvotes: 2