Sebastian Zeki
Sebastian Zeki

Reputation: 6874

How to find and replace without causing duplication

I have a series of text reports with fields like "Contractile Front velocity" on them

Some of them have "Contractile Front velocitycms" on them instead. There are other terms similar to this where characters like cms have been added.

Each term has a numerical result associated with it and I am trying to put the term and the result into a database. The database field will be (for this example) "Contractile Front velocitycms"

So I would like to convert any report (plain text) field that does not have cms associated with it, to Contractile Front velocitycms.

Because I have a load of find a replace problems to solve I created a method that uses StringUtils.replaceEach so that I can use a simple colon separated text file as a lookup dictionary to do the find and replace.

public static String FindNReplace(String n) throws IOException{
    String [] split = null;
    ArrayList<String> orig = new ArrayList<String>();
    String [] orig_arr = null;
    ArrayList<String> newDoc = new ArrayList<String>();
    String [] newDoc_arr = null;

    String dictionary="/Users/sebastianzeki/Documents/workspace/PhysiologyUpperGITotalExtractorv2/src/Overview/FindNReplaceDictionary.txt";
    BufferedReader br = new BufferedReader(new FileReader(dictionary));

    try {
        StringBuilder sb = new StringBuilder();
        String line = br.readLine();

        while (line != null) {
            split=line.split(":");
            System.out.println(split);
            orig.add(split[1]);
            newDoc.add(split[0]);
            sb.append(line);
            sb.append("\n");
            line = br.readLine();
        }
    } finally {
        br.close();
    }

    orig_arr = new String[orig.size()];
    orig_arr = orig.toArray(orig_arr);
    newDoc_arr = new String[newDoc.size()];
    newDoc_arr = newDoc.toArray(newDoc_arr);
    String replacer = StringUtils.replaceEach(n, orig_arr, newDoc_arr);

    return replacer;
}

The dictionary looks like this

PostPr :Post-Prandial
PostPr :Post-prandial
Nausea :nausea

The problem is that if I just use my dictionary to replace Contractile Front velocity with Contractile Front velocitycms then occasionally, where Contractile Front velocitycms already exists I will get Contractile Front velocitycmscms and the replaceEach does not use regex. Can anyone think of a solution to avoid me getting the duplicates mentioned

Upvotes: 2

Views: 81

Answers (1)

Stephen P
Stephen P

Reputation: 14810

What you want is Negative Lookahead to exclude the trailing part.
Negative lookahead is written as (?!pattern) so in your case you want Contractile Front velocity(?!cms) as your pattern to match.

You can try this on RegexPlanet ...
I used:
Regular expression: Contractile Front velocity(?!cms)
Input 1: This Contractile Front velocitycms already has it.
Input 2: But this Contractile Front velocity does not.

You'll see when you hit the Test button that Input 2 gets the "cms" added to it but Input 1 does not get it doubled.

Upvotes: 1

Related Questions