user1881095
user1881095

Reputation: 11

how do I stop REXML from escaping characters?

I'm pulling a bunch of data from one database and feeding it into an application via XML.

So I start with

    re_objects_xml = Document.new
    re_objects_xml.context[:attribute_quote] = :quote
    re_objects_xml.context[:raw] = 'true'
    re_objects_xml.add_element("object-collection")                        
    base_object_collection = re_objects_xml.elements[1]

    timeline_meta = Element.new("Metadata")
    timeline_meta.add_attribute("id", "#{re_meta_id}")

an then I have the following variables:

k = "Comments"
v = "We're pretty good"

and I do

timeline_meta.add_attribute("#{k}","#{v}")

And then add timeline_meta to base_object_collection

base_object_collection << timeline_meta

I end up with XML that contains this:

   ...Comments="GRUBB:  We&apos;re pretty good... 

I'm trying to get

 ...Comments="GRUBB:  We're pretty good...

Can anyone help me see what I'm missing or a better way to do this?

Upvotes: 1

Views: 765

Answers (2)

Maksym Bykovskyy
Maksym Bykovskyy

Reputation: 832

I know this question is very old but I just came across the same issue and my findings might help people that are still forced to work with Ruby 1.8.6.

The thing is the implementation of REXML is very dependant on Ruby version, in fact the implementation differs a lot between different patches of Ruby 1.8.6 for example.

The context flag that should stop REXML from escaping entities is :raw but the fact that it's not working in your case could mean that REXML doesn't understand the flag or the value that you're setting it to.

If you're using a Ruby version earlier than 1.8.6-p110 then you're out of luck. This version doesn't support context flags like :attribute_quote or :raw. So your only options are to either

  1. Upgrade to a later version of Ruby, 1.8.6-p110 and up.

  2. Or post-process the raw XML replacing escaped entities. This should work since REXML will convert & to &amp; and &amp; to &amp;amp;

If you're using the later version of Ruby then context[:raw] has to be set to :all or a list of names to process in raw mode. The context can also be passed into the Document constructor like so Document.new(nil, {:raw => :all, :attribute_quote => :quote})

Upvotes: 0

the Tin Man
the Tin Man

Reputation: 160581

Why are you worrying about a single-quote/apostrophe being converted into the entity? The XML parser/engine does that to help preserve what could be an ambiguous/colliding delimiting character. From the XML spec about Character Data and Markup:

To allow attribute values to contain both single and double quotes, the
apostrophe or single-quote character (') may be represented as " &apos; ", and
the double-quote character (") as " &quot; ".

Because we can delimit the content for the Comments parameter using either ' or ", the spec allows for encoding the embedded single and double quotes as entities, avoiding collisions.

When the XML is parsed on the receiving side, it should decode that entity back into the correct character, or have some function/method that makes it easy. You don't specify what DBM you're using but it should be able to help out, but that's a separate question.

As a stylistic thing in your code:

timeline_meta.add_attribute("#{k}","#{v}")

is wrong. You're redundantly converting strings into strings. Use:

timeline_meta.add_attribute(k, v)

instead.

Upvotes: 1

Related Questions