user4789408
user4789408

Reputation: 1196

Java read word by word from a String

I have a String and I would consider every single word. For example:

"That's a good question"

I need to elaborate every single word:

That, s, a, good, question

I don't need to save them I need to read the single words.

I was testing this solution:

String s = "That's a good question";
String[] words = s.split("\\s+");
for (int i = 0; i < words.length; i++) {
     words[i] = words[i].replaceAll("[^\\w]", "");
}

but I don't know what regular expression I need to separate "That's" in two different words.

Upvotes: 1

Views: 4292

Answers (4)

Przemysław Moskal
Przemysław Moskal

Reputation: 3609

If I didn't misunderstand you, this is what you're looking for - change String[] words = s.split("\\s+"); with String[] words = s.split("[\\s']");.

Upvotes: 1

Drew Wills
Drew Wills

Reputation: 8446

Are you completely sure you need to consider that's as two words? (viz. that is)

Ordinarily, I believe that's is counted as one word in English.

But if your perspective on the requirements is correct, you have a (moderately) difficult problem: I don't think there is any (reasonable) regex that can distinguish between something like that's (contraction of that and is) and something like steve's (possessive).

AFAIK you will have to write something yourself.

Suggestion: take a look at this list of English language contractions. You could use it to make an enumeration of the things you need to handle in a special way.

Basic Example

enum Contraction {
    AINT("ain't", "is not"),
    ARENT("aren't", "are not"),
    // Many, many in between...
    YOUVE("you've", "you have");

    private final String oneWord;
    private final String twoWords;

    private Contraction(String oneWord, String twoWords) {
        this.oneWord = oneWord;
        this.twoWords = twoWords;
    }

    public String getOneWord() {
        return oneWord;
    }

    public String getTwoWords() {
        return twoWords;
    }
}

String s = "That's a good question".toLowerCase();
for (Contraction c : Contraction.values()) {
    s = s.replaceAll(c.getOneWord(), c.getTwoWords())
}
String[] words = s.split("\\s+");
// And so forth...

NOTE: This example handles case sensitivity by converting the entire input to lower case, so the elements in the enum will match. If that doesn't work for you, you may need to handle it in another way.

I'm not clear on what you need to do with the words once you have them, so I left that part out.

Upvotes: 1

Matt
Matt

Reputation: 59

this should work. Replace 's with the second word before running it through the split method.

s.replaceALL("\'s", " is");
String[] words = s.split("\\s+");

This also changes That's to " that, is " if that's what you're looking to do

Upvotes: 0

Yousef
Yousef

Reputation: 500

if you're looking for the regex to match the apostrophe, you can use this to get the whole string containing it.

.*["'].*

and this is for the apostrophe itself

["']

Upvotes: 0

Related Questions