Reputation: 4185
Given a string S, find the number of words in that string. For this problem a word is defined by a string of one or more English letters.
Note: Space or any of the special characters like ![,?.\_'@+] will act as a delimiter.
Input Format: The string will only contain lower case English letters, upper case English letters, spaces, and these special characters: ![,?._'@+].
Output Format: On the first line, print the number of words in the string. The words don't need to be unique. Then, print each word in a separate line.
My code:
Scanner sc = new Scanner(System.in);
String str = sc.nextLine();
String regex = "( |!|[|,|?|.|_|'|@|+|]|\\\\)+";
String[] arr = str.split(regex);
System.out.println(arr.length);
for(int i = 0; i < arr.length; i++)
System.out.println(arr[i]);
When I submit the code, it works for just over half of the test cases. I do not know what the test cases are. I'm asking for help with the Murphy's law. What are the situations where the regex I implemented won't work?
Upvotes: 2
Views: 136
Reputation: 43023
You don't escape some special characters in your regex. Let's start with []
. Since you don't escape them, the part [|,|?|.|_|'|@|+|]
is treated like a set of characters |,?._'@+
. This means that your regex doesn't split on [
and ]
.
For example x..]y+[z
is split to x
, ]y
and [z
.
You can fix that by escaping those characters. That will force you to escape more of them and you end up with a proper definition:
String regex = "( |!|\\[|,|\\?|\\.|_|'|@|\\+|\\])+";
Note that instead of defining alternatives, you could use a set which will make your regex easier to read:
String regex = "[!\\[,?._'@+\\].]+";
In this case you only need to escape [
and ]
.
UPDATE:
There's also a problem with leading special character (like in your example ".Hi?there[broski.]@@@@@"
). You need to split on it but it produces an empty string in the results. I don't think there's a way to use split function without producing it but you can mitigate it by removing the first group before splitting using the same regex:
String[] arr = str.replaceFirst(regex, "").split(regex);
Upvotes: 1