Reputation: 133
I'm trying to find a regular expression in java that will extract pairs of consecutive words in a sentance, like in the example below.
input: word1 word2 word3 word4 ....
output:
etc..
any idea how to do that ?
Upvotes: 3
Views: 967
Reputation: 200168
Too offer a solution without unjustified complexity...
final String in = "word1 word2 word3 word4";
final String[] words = in.split("\\s+");
for (int i = 0; i < words.length - 1; i++)
System.out.println(words[i] + " " + words[i+1]);
prints
word1 word2
word2 word3
word3 word4
Upvotes: 0
Reputation: 43673
Matcher m = Pattern.compile("(?:^|(?<=\\s))(?=(\\S+\\s+\\S+)(?=\\s|$))")
.matcher("word1 word2 word3 word4");
while (m.find()) {
System.out.println(m.group(1));
}
word1 word2
word2 word3
word3 word4
Test this code here.
Upvotes: 3
Reputation: 1048
Here you are:
public class Example {
public static void main(String[] args) {
String words = "word1 word2 word3 word4";
String regex="\\w+\\s+\\w+";
Pattern p = Pattern.compile(regex);
Matcher matcher = p.matcher(words);
while(matcher.find()){
String found = matcher.group();
System.out.println(found);
String splitted = found.split("\\s+")[1];
words = words.replace(found, splitted);
matcher = p.matcher(words);
}
}
}
Upvotes: 0
Reputation: 213261
There you go: -
"\\w+\\s+\\w+"
One or more word, then one or more space, and then one or more word.
UPDATE : -
Just noticed that the above regex misses your second line of output.
So you can just split your string on space
, and work with your array.
String[] words = str.split("\\s+");
And then get word for every pair of indices.
Upvotes: -1