ShadyBears
ShadyBears

Reputation: 4185

Java Split regex

Given a string S, find the number of words in that string. For this problem a word is defined by a string of one or more English letters.

Note: Space or any of the special characters like ![,?.\_'@+] will act as a delimiter.

Input Format: The string will only contain lower case English letters, upper case English letters, spaces, and these special characters: ![,?._'@+].

Output Format: On the first line, print the number of words in the string. The words don't need to be unique. Then, print each word in a separate line.

My code:

    Scanner sc = new Scanner(System.in);
    String str = sc.nextLine();
    String regex = "( |!|[|,|?|.|_|'|@|+|]|\\\\)+";
    String[] arr = str.split(regex);
    
    System.out.println(arr.length);
    
    for(int i = 0; i < arr.length; i++)
        System.out.println(arr[i]);

When I submit the code, it works for just over half of the test cases. I do not know what the test cases are. I'm asking for help with the Murphy's law. What are the situations where the regex I implemented won't work?

Upvotes: 2

Views: 136

Answers (1)

Szymon
Szymon

Reputation: 43023

You don't escape some special characters in your regex. Let's start with []. Since you don't escape them, the part [|,|?|.|_|'|@|+|] is treated like a set of characters |,?._'@+. This means that your regex doesn't split on [ and ].

For example x..]y+[z is split to x, ]y and [z.

You can fix that by escaping those characters. That will force you to escape more of them and you end up with a proper definition:

String regex = "( |!|\\[|,|\\?|\\.|_|'|@|\\+|\\])+";

Note that instead of defining alternatives, you could use a set which will make your regex easier to read:

String regex = "[!\\[,?._'@+\\].]+";

In this case you only need to escape [ and ].

UPDATE:

There's also a problem with leading special character (like in your example ".Hi?there[broski.]@@@@@"). You need to split on it but it produces an empty string in the results. I don't think there's a way to use split function without producing it but you can mitigate it by removing the first group before splitting using the same regex:

String[] arr = str.replaceFirst(regex, "").split(regex);

Upvotes: 1

Related Questions