Reputation: 1196
I have a String and I would consider every single word. For example:
"That's a good question"
I need to elaborate every single word:
That, s, a, good, question
I don't need to save them I need to read the single words.
I was testing this solution:
String s = "That's a good question";
String[] words = s.split("\\s+");
for (int i = 0; i < words.length; i++) {
words[i] = words[i].replaceAll("[^\\w]", "");
}
but I don't know what regular expression I need to separate "That's" in two different words.
Upvotes: 1
Views: 4292
Reputation: 3609
If I didn't misunderstand you, this is what you're looking for - change String[] words = s.split("\\s+");
with String[] words = s.split("[\\s']");
.
Upvotes: 1
Reputation: 8446
Are you completely sure you need to consider that's
as two words? (viz. that is
)
Ordinarily, I believe that's
is counted as one word in English.
But if your perspective on the requirements is correct, you have a (moderately) difficult problem: I don't think there is any (reasonable) regex that can distinguish between something like that's
(contraction of that
and is
) and something like steve's
(possessive).
AFAIK you will have to write something yourself.
Suggestion: take a look at this list of English language contractions. You could use it to make an enumeration of the things you need to handle in a special way.
enum Contraction {
AINT("ain't", "is not"),
ARENT("aren't", "are not"),
// Many, many in between...
YOUVE("you've", "you have");
private final String oneWord;
private final String twoWords;
private Contraction(String oneWord, String twoWords) {
this.oneWord = oneWord;
this.twoWords = twoWords;
}
public String getOneWord() {
return oneWord;
}
public String getTwoWords() {
return twoWords;
}
}
String s = "That's a good question".toLowerCase();
for (Contraction c : Contraction.values()) {
s = s.replaceAll(c.getOneWord(), c.getTwoWords())
}
String[] words = s.split("\\s+");
// And so forth...
NOTE: This example handles case sensitivity by converting the entire input to lower case, so the elements in the enum
will match. If that doesn't work for you, you may need to handle it in another way.
I'm not clear on what you need to do with the words once you have them, so I left that part out.
Upvotes: 1
Reputation: 59
this should work. Replace 's with the second word before running it through the split method.
s.replaceALL("\'s", " is");
String[] words = s.split("\\s+");
This also changes That's to " that, is " if that's what you're looking to do
Upvotes: 0
Reputation: 500
if you're looking for the regex to match the apostrophe, you can use this to get the whole string containing it.
.*["'].*
and this is for the apostrophe itself
["']
Upvotes: 0