Ray Chen
Ray Chen

Reputation: 99

Replace nested string with some rules

There are 3 rules in the string:

  1. It contains either word or group (enclosed by parentheses), and group can be nested;

  2. If there is a space between word or group, those words or groups should append with "+".

For example:

"a b" needs to be "+a +b"
"a (b c)" needs to be "+a +(+b +c)"
  1. If there is a | between word or group, those words or groups should be surround with parentheses. For example:
"a|b" needs to be "(a b)"
"a|b|c" needs to be "(a b c)"

Consider all the rules, here is another example:

"aa|bb|(cc|(ff gg)) hh" needs to be "+(aa bb (cc (+ff +gg))) +hh"

I have tried to use regex, stack and recursive descent parser logic, but still cannot fully solve the problem.

Could anyone please share the logic or pseudo code on this problem?

New edited: One more important rule: vertical bar has higher precedence.

For example:

aa|bb hh cc|dd (a|b) needs to be +(aa bb) +hh +(cc dd) +((a b))

(aa dd)|bb|cc (ee ff)|(gg hh) needs to be +((+aa +dd) bb cc) +((+ee +ff) (+gg +hh))

New edited: To solve the precedence problem, I find a way to add the parentheses before calling Sunil Dabburi's methods.

For example:

aa|bb hh cc|dd (a|b) will be (aa|bb) hh (cc|dd) (a|b)

(aa dd)|bb|cc (ee ff)|(gg hh) will be ((aa dd)|bb|cc) ((ee ff)|(gg hh))

Since the performance is not a big concern to my application, this way at least make it work for me. I guess the JavaCC tool may solve this problem beautifully. Hope someone else can continue to discuss and contribute this problem.

Upvotes: 3

Views: 181

Answers (2)

s.nguyen
s.nguyen

Reputation: 150

Here is my attempt. Based on your examples and a few that I came up with I believe it is correct under the rules. I solved this by breaking the problem up into 2 parts.

  1. Solving the case where I assume the string only contains words or is a group with only words.
  2. Solving words and groups by substituting child groups out, use the 1) part and recursively repeating 2) with the child groups.
    private String transformString(String input) {
        Stack<Pair<Integer, String>> childParams = new Stack<>();
        String parsedInput = input;
        int nextInt = Integer.MAX_VALUE;
        Pattern pattern = Pattern.compile("\\((\\w|\\|| )+\\)");
        Matcher matcher = pattern.matcher(parsedInput);
        while (matcher.find()) {
            nextInt--;
            parsedInput = matcher.replaceFirst(String.valueOf(nextInt));
            String childParam = matcher.group();
            childParams.add(Pair.of(nextInt, childParam));
            matcher = pattern.matcher(parsedInput);
        }

        parsedInput = transformBasic(parsedInput);
        while (!childParams.empty()) {
            Pair<Integer, String> childGroup = childParams.pop();
            parsedInput = parsedInput.replace(childGroup.fst.toString(), transformBasic(childGroup.snd));
        }
        return parsedInput;
    }

    // Transform basic only handles strings that contain words. This allows us to simplify the problem
    // and not have to worry about child groups or nested groups.
    private String transformBasic(String input) {
        String transformedBasic = input;
        if (input.startsWith("(")) {
            transformedBasic = input.substring(1, input.length() - 1);
        }

        // Append + in front of each word if there are multiple words.
        if (transformedBasic.contains(" ")) {
            transformedBasic = transformedBasic.replaceAll("( )|^", "$1+");
        }
        // Surround all words containing | with parenthesis.
        transformedBasic = transformedBasic.replaceAll("([\\w]+\\|[\\w|]*[\\w]+)", "($1)");
        // Replace pipes with spaces.
        transformedBasic = transformedBasic.replace("|", " ");
        if (input.startsWith("(") && !transformedBasic.startsWith("(")) {
            transformedBasic = "(" + transformedBasic + ")";
        }
        return transformedBasic;
    }

Verified with the following test cases:

@ParameterizedTest
    @CsvSource({
            "a b,+a +b",
            "a (b c),+a +(+b +c)",
            "a|b,(a b)",
            "a|b|c,(a b c)",
            "aa|bb|(cc|(ff gg)) hh,+(aa bb (cc (+ff +gg))) +hh",
            "(aa(bb(cc|ee)|ff) gg),(+aa(bb(cc ee) ff) +gg)",
            "(a b),(+a +b)",
            "(a(c|d) b),(+a(c d) +b)",
            "bb(cc|ee),bb(cc ee)",
            "((a|b) (a b)|b (c|d)|e),(+(a b) +((+a +b) b) +((c d) e))"
    })
    void testTransformString(String input, String output) {
        Assertions.assertEquals(output, transformString(input));
    }

    @ParameterizedTest
    @CsvSource({
            "a b,+a +b",
            "a b c,+a +b +c",
            "a|b,(a b)",
            "(a b),(+a +b)",
            "(a|b),(a b)",
            "a|b|c,(a b c)",
            "(aa|bb cc|dd),(+(aa bb) +(cc dd))",
            "(aa|bb|ee cc|dd),(+(aa bb ee) +(cc dd))",
            "aa|bb|cc|ff gg hh,+(aa bb cc ff) +gg +hh"
    })
    void testTransformBasic(String input, String output) {
        Assertions.assertEquals(output, transformBasic(input));
    }

Upvotes: 2

Sunil Dabburi
Sunil Dabburi

Reputation: 1472

I tried to solve the problem. Not sure if it works in all cases. Verified with the inputs given in the question and it worked fine.

  1. We need to format the pipes first. That will help add necessary parentheses and spacing.
  2. The spaces generated as part of pipe processing can interfere with actual spaces that are available in our expression. So used $ symbol to mask them.
  3. To process spaces, its tricky as parantheses need to be processed individually. So the approach I am following is to find a set of parantheses starting from outside and going inside.

So typically we have <left_part><parantheses_code><right_part>. Now left_part can be empty, similary right_part can be empty. we need to handle such cases. Also, if the right_part starts with a space, we need to add '+' to left_part as per space requirement.

NOTE: I am not sure what's expected of (a|b). If the result should be ((a b)) or (a b). I am going with ((a b)) purely by the definition of it.

Now here is the working code:

public class Test {

    public static void main(String[] args) {
        String input = "aa|bb hh cc|dd (a|b)";
        String result = formatSpaces(formatPipes(input)).replaceAll("\\$", " ");
        System.out.println(result);
    }

    private static String formatPipes(String input) {

        while (true) {
            char[] chars = input.toCharArray();
            int pIndex = input.indexOf("|");

            if (pIndex == -1) {
                return input;
            }

            input = input.substring(0, pIndex) + '$' + input.substring(pIndex + 1);

            int first = pIndex - 1;
            int closeParenthesesCount = 0;
            while (first >= 0) {

                if (chars[first] == ')') {
                    closeParenthesesCount++;
                }

                if (chars[first] == '(') {
                    if (closeParenthesesCount > 0) {
                        closeParenthesesCount--;
                    }
                }

                if (chars[first] == ' ') {
                    if (closeParenthesesCount == 0) {
                        break;
                    }
                }
                first--;
            }

            String result;

            if (first > 0) {
                result = input.substring(0, first + 1) + "(";
            } else {
                result = "(";
            }

            int last = pIndex + 1;
            int openParenthesesCount = 0;
            while (last <= input.length() - 1) {

                if (chars[last] == '(') {
                    openParenthesesCount++;
                }

                if (chars[last] == ')') {
                    if (openParenthesesCount > 0) {
                        openParenthesesCount--;
                    }
                }

                if (chars[last] == ' ') {
                    if (openParenthesesCount == 0) {
                        break;
                    }
                }
                last++;
            }

            if (last >= input.length() - 1) {
                result = result + input.substring(first + 1) + ")";
            } else {
                result = result + input.substring(first + 1, last) + ")" + input.substring(last);
            }

            input = result;
        }
    }

    private static String formatSpaces(String input) {

        if (input.isEmpty()) {
            return "";
        }

        int startIndex = input.indexOf("(");
        if (startIndex == -1) {
            if (input.contains(" ")) {
                String result = input.replaceAll(" ", " +");
                if (!result.trim().startsWith("+")) {
                    result = '+' + result;
                }
                return result;
            } else {
                return input;
            }
        }

        int endIndex = startIndex + matchingCloseParenthesesIndex(input.substring(startIndex));

        if (endIndex == -1) {
            System.out.println("Invalid input!!!");
            return "";
        }

        String first = "";
        String last = "";
        if (startIndex > 0) {
            first = input.substring(0, startIndex);
        }
        if (endIndex < input.length() - 1) {
            last = input.substring(endIndex + 1);
        }

        String result = formatSpaces(first);
        String parenthesesStr = input.substring(startIndex + 1, endIndex);
        if (last.startsWith(" ") && first.isEmpty()) {
            result = result + "+";
        }

        result = result + "("
                + formatSpaces(parenthesesStr)
                + ")"
                + formatSpaces(last);

        return result;
    }

    private static int matchingCloseParenthesesIndex(String input) {
        int counter = 1;
        char[] chars = input.toCharArray();
        for (int i = 1; i < chars.length; i++) {
            char ch = chars[i];
            if (ch == '(') {
                counter++;
            } else if (ch == ')') {
                counter--;
            }

            if (counter == 0) {
                return i;
            }
        }
        return -1;
    }

}

Upvotes: 2

Related Questions