SRSR333
SRSR333

Reputation: 266

Modify the characters of words in a Java string with punctuation, but keep the positions of said punctuation?

For instance, take the following list of Strings, disregarding the inverted commas:

"Hello"
"Hello!"
"I'm saying Hello!"
"I haven't said hello yet, but I will."

Now let's say I'd like to perform a certain operation on the characters of each word β€” for instance, say I'd like to reverse the characters, but keep the positions of the punctuation. So the result would be:

"olleH"
"olleH!"
"m'I gniyas olleH!"
"I tneva'h dias olleh tey, tub I lliw."

Ideally I'd like my code to be independent of the operation performed on the string (another example would be a random shuffling of letters), and independent of all punctuationβ€”so hyphens, apostrophes, commas, full stops, en/em dashes, etc. all remain in their original positions after the operation is performed. This probably requires some form of regular expressions.

For this, I was thinking that I should save the indices and characters of all punctuation in a given word, perform the operation, and then re-insert all punctuation at their correct positions. However, I can't think of a way to do this, or a class to use.

I have a first attempt, but this unfortunately does not work with punctuation, which is the key:

jshell> String str = "I haven't said hello yet, but I will."
str ==> "I haven't said hello yet, but I will."

jshell> Arrays.stream(str.split("\\s+")).map(x -> (new StringBuilder(x)).reverse().toString()).reduce((x, y) -> x + " " + y).get()
$2 ==> "I t'nevah dias olleh ,tey tub I .lliw"

Has anyone got an idea how I might fix this? Thanks very much. There's no need for full working codeβ€”maybe just a signpost to an appropriate class I could use to perform this operation.

Upvotes: 3

Views: 448

Answers (2)

Andreas
Andreas

Reputation: 159096

No need to use regex for this, and you certainly shouldn't use split("\\s+"), since you'd lose consecutive spaces, and the type of whitespace characters, i.e. the spaces of the result could be incorrect.

You also shouldn't use charAt() or anything like it, since that would not support letters from the Unicode Supplemental Planes, i.e. Unicode characters that are stored in Java strings as surrogate pairs.

Basic logic:

  • Locate start of word, i.e. start of string or first character following whitespace.
  • Locate end of word, i.e. last character preceding whitespace or end of string.
  • Iterating from beginning and end in parallel:
    • Skip characters that are not letters.
    • Swap the letters.

As Java code, with full Unicode support:

public static String reverseLettersOfWords(String input) {
    int[] codePoints = input.codePoints().toArray();
    for (int i = 0, start = 0; i <= codePoints.length; i++) {
        if (i == codePoints.length || Character.isWhitespace(codePoints[i])) {
            for (int end = i - 1; ; start++, end--) {
                while (start < end && ! Character.isLetter(codePoints[start]))
                    start++;
                while (start < end && ! Character.isLetter(codePoints[end]))
                    end--;
                if (start >= end)
                    break;
                int tmp = codePoints[start];
                codePoints[start] = codePoints[end];
                codePoints[end] = tmp;
            }
            start = i + 1;
        }
    }
    return new String(codePoints, 0, codePoints.length);
}

Test

System.out.println(reverseLettersOfWords("Hello"));
System.out.println(reverseLettersOfWords("Hello!"));
System.out.println(reverseLettersOfWords("I'm saying Hello!"));
System.out.println(reverseLettersOfWords("I haven't said hello yet, but I will."));
System.out.println(reverseLettersOfWords("Works with surrogate pairs: 𝓐𝓑𝓒+𝓓 "));

Output

olleH
olleH!
m'I gniyas olleH!
I tneva'h dias olleh tey, tub I lliw.
skroW htiw etagorrus sriap: 𝓓𝓒𝓑+𝓐 

Note that the special letters at the end are the first 4 shown here in column "Script (or Calligraphy)", "Bold", e.g. the 𝓐 is Unicode Character 'MATHEMATICAL BOLD SCRIPT CAPITAL A' (U+1D4D0), which in Java is two characters "\uD835\uDCD0".


UPDATE

The above implementation is optimized for reversing the letters of the word. To apply an arbitrary operation to mangle the letters of the word, use the following implementation:

public static String mangleLettersOfWords(String input) {
    int[] codePoints = input.codePoints().toArray();
    for (int i = 0, start = 0; i <= codePoints.length; i++) {
        if (i == codePoints.length || Character.isWhitespace(codePoints[i])) {
            int wordCodePointLen = 0;
            for (int j = start; j < i; j++)
                if (Character.isLetter(codePoints[j]))
                    wordCodePointLen++;
            if (wordCodePointLen != 0) {
                int[] wordCodePoints = new int[wordCodePointLen];
                for (int j = start, k = 0; j < i; j++)
                    if (Character.isLetter(codePoints[j]))
                        wordCodePoints[k++] = codePoints[j];
                int[] mangledCodePoints = mangleWord(wordCodePoints.clone());
                if (mangledCodePoints.length != wordCodePointLen)
                    throw new IllegalStateException("Mangled word is wrong length: '" + new String(wordCodePoints, 0, wordCodePoints.length) + "' (" + wordCodePointLen + " code points)" +
                                                                " vs mangled '" + new String(mangledCodePoints, 0, mangledCodePoints.length) + "' (" + mangledCodePoints.length + " code points)");
                for (int j = start, k = 0; j < i; j++)
                    if (Character.isLetter(codePoints[j]))
                        codePoints[j] = mangledCodePoints[k++];
            }
            start = i + 1;
        }
    }
    return new String(codePoints, 0, codePoints.length);
}
private static int[] mangleWord(int[] codePoints) {
    return mangleWord(new String(codePoints, 0, codePoints.length)).codePoints().toArray();
}
private static CharSequence mangleWord(String word) {
    return new StringBuilder(word).reverse();
}

You can of course replace the hardcoded call to the either mangleWord method with a call to a passed-in Function<int[], int[]> or Function<String, ? extends CharSequence> parameter, if needed.

The result with that implementation of the mangleWord method(s) is the same as the original implementation, but you can now easily implement a different mangling algorithm.

E.g. to randomize the letters, simply shuffle the codePoints array:

private static int[] mangleWord(int[] codePoints) {
    Random rnd = new Random();
    for (int i = codePoints.length - 1; i > 0; i--) {
        int j = rnd.nextInt(i + 1);
        int tmp = codePoints[j];
        codePoints[j] = codePoints[i];
        codePoints[i] = tmp;
    }
    return codePoints;
}

Sample Output

Hlelo
Hlleo!
m'I nsayig oHlel!
I athen'v siad eohll yte, btu I illw.
srWok twih rueoatrsg rpasi: 𝓑𝓒𝓐+𝓓

Upvotes: 4

Malcolm Crum
Malcolm Crum

Reputation: 4879

I suspect there's a more efficient solution but here's a naive one:

  1. Split sentence into words on spaces (note - if you have multiple spaces my implementation will have problems)
  2. Strip punctuation
  3. Reverse each word
  4. Go through each letter, and insert character from reversed word AND insert punctuation from original word if necessary
public class Reverser {

    public String reverseSentence(String sentence) {
        String[] words = sentence.split(" ");
        return Arrays.stream(words).map(this::reverseWord).collect(Collectors.joining(" "));
    }

    private String reverseWord(String word) {
        String noPunctuation = word.replaceAll("\\W", "");
        String reversed = new StringBuilder(noPunctuation).reverse().toString();
        StringBuilder result = new StringBuilder();
        for (int i = 0; i < word.length(); ++i) {
            char ch = word.charAt(i);
            if (!Character.isAlphabetic(ch) && !Character.isDigit(ch)) {
                result.append(ch);
            }
            if (i < reversed.length()) {
                result.append(reversed.charAt(i));
            }
        }
        return result.toString();
    }
}

Upvotes: 1

Related Questions