Reputation: 2896
Please find an example of my string:
<s id="1">Here we show that <ANAPH id="535" biotype="partof_product">the approximately 600-amino acid; region</ANAPH> something somethingelse .</s>
The required function is to clean the string by removing the angle bracket enclosed sequences (including the angle brackets). So for my example string above the desired output would be:
Here we show that the approximately 600-amino acid; region something somethingelse .
For the regular expression = \<{1}.*\>{1} and on using the replaceAll function, the entire line gets replaced; I understand why it happens that way. Could someone point out a way express the pattern more specifically using regular expressions in order to obtain the desired output?
Thank you.
Edit1:
Yes, the above string is taken care of using the regular expression suggested by Kassym Dorsel
However, for the string below:
<s id="7"><ANAPH id="100216" biotype="supertype" assoc_ante="48275" assoc_rel="set-member" coref_chain="set_234">The C. elegans genome sequence</ANAPH> was completed two years ago [ 1 ] , and both the Drosophila [ 2 ] and human genomes are essentially completely sequenced at this point .</s>
The output on using the regular expression is as:
<ANAPH id="100216" biotype="supertype" assoc_ante="48275" assoc_rel="set-member" coref_chain="set_234">The C. elegans genome sequence</ANAPH> was completed two years ago [ 1 ] , and both the Drosophila [ 2 ] and human genomes are essentially completely sequenced at this point .</s>
The desired output is:
The C. elegans genome sequence was completed two years ago [ 1 ] , and both the Drosophila [ 2 ] and human genomes are essentially completely sequenced at this point .
Would you be able to help me generalize the regular expression?
Upvotes: 0
Views: 179
Reputation: 4843
Given this : <s id="1">Here we show that <ANAPH id="535" biotype="partof_product">the approximately 600-amino acid; region</ANAPH> something somethingelse .</s>
Using this <[^>]*?>
and replacing with blank gives this :
Here we show that the approximately 600-amino acid; region something somethingelse .
Upvotes: 4