Atai Voltaire
Atai Voltaire

Reputation: 133

regular expression that extract consecutive words in a sentance

I'm trying to find a regular expression in java that will extract pairs of consecutive words in a sentance, like in the example below.

input: word1 word2 word3 word4 ....

output:

etc..

any idea how to do that ?

Upvotes: 3

Views: 967

Answers (4)

Marko Topolnik
Marko Topolnik

Reputation: 200168

Too offer a solution without unjustified complexity...

final String in = "word1 word2 word3 word4";
final String[] words = in.split("\\s+");
for (int i = 0; i < words.length - 1; i++)
  System.out.println(words[i] + " " + words[i+1]);

prints

word1 word2
word2 word3
word3 word4

Upvotes: 0

Ωmega
Ωmega

Reputation: 43673

Java code:

Matcher m = Pattern.compile("(?:^|(?<=\\s))(?=(\\S+\\s+\\S+)(?=\\s|$))")
  .matcher("word1 word2 word3 word4");
while (m.find()) {
  System.out.println(m.group(1));
}

Output:

word1 word2
word2 word3
word3 word4

Test this code here.

Upvotes: 3

Pawel Solarski
Pawel Solarski

Reputation: 1048

Here you are:

public class Example {
    public static void main(String[] args) {
        String words = "word1 word2 word3 word4";
        String regex="\\w+\\s+\\w+";
        Pattern p = Pattern.compile(regex);
        Matcher matcher = p.matcher(words);
        while(matcher.find()){
            String found = matcher.group();
            System.out.println(found);
            String splitted = found.split("\\s+")[1];
            words = words.replace(found, splitted);
            matcher = p.matcher(words);
        }
    }
}

Upvotes: 0

Rohit Jain
Rohit Jain

Reputation: 213261

There you go: -

"\\w+\\s+\\w+"

One or more word, then one or more space, and then one or more word.


UPDATE : -

Just noticed that the above regex misses your second line of output. So you can just split your string on space, and work with your array.

String[] words = str.split("\\s+");

And then get word for every pair of indices.

Upvotes: -1

Related Questions