Mohammed H
Mohammed H

Reputation: 7048

String replace in substring

I want to write a method for a Java class. The method accepts as input a string of XML data as given below.

<?xml version="1.0" encoding="UTF-8"?>
<library>

    <book>
        <name> <> Programming in ANSI C <> </name>
        <author> <>  Balaguruswamy <> </author>
        <comment> <> This comment may contain xml entities such as &, < and >. <> </comment>
    </book>

    <book>
        <name> <> A Mathematical Theory of Communication <> </name>
        <author> <> Claude E. Shannon <> </author>
        <comment> <> This comment also may contain xml entities. <> </comment>
    </book>

    <!-- This library contains more than ten thousand books. -->
</library>

The XML string contains a lot of substring starting and ending with <>. The substring may contain XML entities such as >, <, &, ' and ". The method need to replace them with &gt;, &lt;, &amp;. &apos; and &quot; respectively.

Is there any regular-expression method in Java to accomplish this task?

Upvotes: 7

Views: 393

Answers (2)

Justin Pihony
Justin Pihony

Reputation: 67065

Is this data being passed to you, or can you control it? If so, then I would suggest using a CDATA block. If you are really unsure about the data being entered into the xml blocks, then just wrap everything in a CDATA before it is saved to the DB

If you do not have control over this, then as far as I know, this will take a fair amount of coding due to the number of edge cases you possibly will have to deal with. Not something that a simple regex will be able to deal with (if a valid block is starting, if one is ending, if one has already ended, etc)

Here is a very basic regex for the <> case, but the rest I really believe just get extremely complicated

\<\>* //For <> changes

Upvotes: 3

punny
punny

Reputation: 403

You can follow in an example

  1. Read a XML file by Dom or SAX
  2. Replace string by Regular expression
  3. Write a XML file by Dom or SAX

Upvotes: 2

Related Questions