rexml_sucks
rexml_sucks

Reputation: 21

ruby malformed XML: missing tag start

I have a very weird problem: I run the same code on the two xml files, the second of which is the copy of the first one (I copied the contents, maybe that's a problem). The code uses REXML to parse the xml file, on the first file it's all good, on the second I have this error: Failed: malformed XML: missing tag start Line: 2 Position: 102 Last 80 unconsumed characters:

<t>dede</t> 

The contents of the xml file is:

<?xml version="1.0" standalone="yes"?>
<t>dede</t>

Any ideas?

Thanks a lot

Upvotes: 2

Views: 3898

Answers (3)

Vlad
Vlad

Reputation: 85

It's because of the file encoding. I have the same problem and found out the file was UCS-2 encoded. Either UTF-8 or ANSI works, but UCS-2 doesn't, it seems. It probably needs specialized parsers for this format first. I just converted the xml file in Notepad++ to test the different encodings.

Upvotes: 2

mcv
mcv

Reputation: 4439

REXML seems a bit too eager to throw a ParseException. Encoding is definitely a major culprit. Check the encoding of your files.

Upvotes: 0

Phrogz
Phrogz

Reputation: 303381

I do not have any such problem using this code:

require 'rexml/document'
doc = REXML::Document.new <<ENDXML
  <?xml version="1.0" standalone="yes"?>
  <t>dede</t>
ENDXML

doc.each_element('//t'){ |e| puts e }
#=> <t>dede</t>

What version of Ruby are you using, and what does your code actually look like?

Edit: Based off the new information that you're using the stream parser, here's another piece of code that also works for me using Ruby 1.8.7:

class Listener
  def method_missing( name, *args ); puts "I don't support '#{name}'"; end
  def tag_start( name, attrs ); puts "<#{name} #{attrs.inspect}>"; end
  def text( str ); p str; end
  def tag_end( name ); puts "</#{name}>"; end
end

require 'stringio'
xml = StringIO.new <<ENDXML
    <?xml version="1.0" standalone="yes"?>
    <t>dede</t>
ENDXML

require 'rexml/document'
doc = REXML::Document.parse_stream( xml, Listener.new )
#=> "\t"
#=> I don't support 'xmldecl'
#=> "\n\t"
#=> <t {}>
#=> "dede"
#=> </t>
#=> "\n"

Upvotes: 2

Related Questions