Roshanck
Roshanck

Reputation: 2290

How to insert special characters taken from a string into another string?

I have a string,

    string1 = "Sri Lanka National Chess Championship this year and represented Sri Lanka at represented Sri Lanka Universities at the World University Chess Championships."

And I have another string named 'string2' which only have strings surrounded by '<NOUN> and </NOUN>' tags separated by a space.

string2 = "<NOUN>Sri Lanka National Chess Championship</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>World University Chess</NOUN>"

Note that second string can have any no of noun tagged words(based on the 'string1',eg: if string1 has 3 nouns, string2 will have same 3 nouns surrounded by noun tags)
I want to add tags to the 'string1' and make string1 as follows,

string1 = "<NOUN>Sri Lanka National Chess Championship</NOUN> this year and represented <NOUN>Sri Lanka</NOUN> at represented <NOUN>Sri Lanka</NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships."

I used following code to do this,

Pattern p = Pattern.compile("<NOUN>(.*?)</NOUN>");
    Matcher m = p.matcher(string2);
    while(m.find()) {
        string1= string1.replaceAll(m.group(1),m.group(0));
    } 

But it gives me following output,

<NOUN><NOUN><NOUN>Sri Lanka</NOUN></NOUN> National Chess Championship</NOUN> this year and represented <NOUN><NOUN>Sri Lanka</NOUN></NOUN> at represented <NOUN><NOUN>Sri Lanka</NOUN></NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships.

Can anyone please tell me how to do this correctly?
Or please tell me how to get the desired output form the given output?

Upvotes: 3

Views: 672

Answers (2)

Grisha Weintraub
Grisha Weintraub

Reputation: 7986

instead of :

string1= string1.replaceAll(m.group(1),m.group(0));

use :

string1= string1.replaceAll("(?<!<NOUN>)("+m.group(1)+")(?!</NOUN>)",m.group(0));

See more about "Look Ahead and Look Behind Constructs" here

Upvotes: 2

prathmesh.kallurkar
prathmesh.kallurkar

Reputation: 5686

The problem with your example is that Sri Lanka National Chess Championship is a noun and Sri Lanka, a part of this string is also a noun. So, your matcher is replacing strings a multiple times.

You can solve this issue by not replacing the string fragments that have been replaced already. I broke the string into three parts for each match : before, match-str, after. Maintain the order of the broken strings. Vector is a very convenient data-structure for this.

import java.util.Vector;
import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class Check {

static String print(Vector<String> parts) {
    String str = parts.elementAt(0);

    for(int i=1; i<parts.size(); i++) {
        str += parts.elementAt(i); 
        //System.out.print(i + " : " + parts.elementAt(i) + "\n");
    }

    return str;
}

public static void main(String args[]) {
    String string1;
    String string2;
    String expected;

    string1 = "Sri Lanka National Chess Championship this year and represented Sri Lanka at represented Sri Lanka Universities at the World University Chess Championships.";
    string2 = "<NOUN>Sri Lanka National Chess Championship</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>Sri Lanka</NOUN> <NOUN>World University Chess</NOUN>";
    expected = "<NOUN>Sri Lanka National Chess Championship</NOUN> this year and represented <NOUN>Sri Lanka</NOUN> at represented <NOUN>Sri Lanka</NOUN> Universities at the <NOUN>World University Chess</NOUN> Championships.";


    Pattern p = Pattern.compile("<NOUN>(.*?)</NOUN>");
    Matcher m = p.matcher(string2);
    Vector<String> parts = new Vector<String>();
    parts.add(string1);

    while(m.find()) {
        for(int i=0; i<parts.size(); i++) {

            //search for used part
            if(parts.elementAt(i).indexOf("<NOUN>")!=-1) {
                continue;
            }

            // search for pattern
            String cur = parts.elementAt(i);
            int disp = cur.indexOf(m.group(1));
            if(disp==-1) {
                continue;
            } else {
                parts.remove(i);
                Vector<String> newParts = new Vector<String>();

                if(disp!=0) {
                    newParts.add(cur.substring(0, disp));
                }

                newParts.add(m.group(0));

                if((disp+m.group(1).length())!=cur.length()) {
                    newParts.add(cur.substring(disp+m.group(1).length()));
                }

                if(i!=0) {
                    parts.addAll(i, newParts);
                } else {
                    parts.addAll(newParts);
                }

                //System.out.print(print(parts) + "\n");
            }           
        }
    }

    string1 = print(parts);
    if(!string1.equals(expected)) {
        System.out.println("Unexpected output !!");
    } else {
        System.out.println("Correct !!");
    }
}

};

You can rename the print method to stringify for convenience.

Upvotes: 0

Related Questions