leba-lev
leba-lev

Reputation: 2896

Regular expression (repeated boundary pattern) in Java

Please find an example of my string:

<s id="1">Here we show that <ANAPH id="535" biotype="partof_product">the approximately 600-amino acid; region</ANAPH> something somethingelse .</s>

The required function is to clean the string by removing the angle bracket enclosed sequences (including the angle brackets). So for my example string above the desired output would be:

Here we show that the approximately 600-amino acid; region something somethingelse .

For the regular expression = \<{1}.*\>{1} and on using the replaceAll function, the entire line gets replaced; I understand why it happens that way. Could someone point out a way express the pattern more specifically using regular expressions in order to obtain the desired output?

Thank you.


Edit1:

Yes, the above string is taken care of using the regular expression suggested by Kassym Dorsel

However, for the string below:

<s id="7"><ANAPH id="100216" biotype="supertype" assoc_ante="48275" assoc_rel="set-member" coref_chain="set_234">The C. elegans genome sequence</ANAPH> was completed two years ago [ 1 ] , and both the Drosophila [ 2 ] and human genomes are essentially completely sequenced at this point .</s>

The output on using the regular expression is as:

<ANAPH id="100216" biotype="supertype" assoc_ante="48275" assoc_rel="set-member" coref_chain="set_234">The C. elegans genome sequence</ANAPH> was completed two years ago [ 1 ] , and both the Drosophila [ 2 ] and human genomes are essentially completely sequenced at this point .</s>

The desired output is:

The C. elegans genome sequence was completed two years ago [ 1 ] , and both the Drosophila [ 2 ] and human genomes are essentially completely sequenced at this point .

Would you be able to help me generalize the regular expression?

Upvotes: 0

Views: 179

Answers (1)

Kassym Dorsel
Kassym Dorsel

Reputation: 4843

Given this : <s id="1">Here we show that <ANAPH id="535" biotype="partof_product">the approximately 600-amino acid; region</ANAPH> something somethingelse .</s>

Using this <[^>]*?> and replacing with blank gives this :

Here we show that the approximately 600-amino acid; region something somethingelse .

Upvotes: 4

Related Questions