Reputation: 1511
I have a text string that looks as follows:
word word word {{t:word word|word}} word word {{t:word|word}} word word...
I'm interested to extract all strings that start with "{{t" and end with "}}". I don't care about the rest. I don't know in advance the number of words in "{{..|..}}". If it wasn't a space separating the words inside then splitting the text on space would work. I'm not sure how to write a regular expression to get this done. I thought about running over the text, char by char, and then store everything between "{{t:" and "}}", but would like to know a cleaner way to do the same.
Thank you!
EDIT Expected output from above:
An array of strings String[] a
where a[0]
is {{t:word word|word}}
and a[1]
is {{t:word|word}}
.
Upvotes: 2
Views: 108
Reputation: 199353
This worked for me:
import java.util.regex.*;
class WordTest {
public static void main( String ... args ) {
String input = "word word word {{t:word word|word}} word word {{t:word|word}} word word...";
Pattern p = Pattern.compile("(\\{\\{.*?\\}\\})");
Matcher m = p.matcher( input );
while( m.find() ) {
System.out.println( m.group(1) );
}
}
}
Upvotes: 0
Reputation: 35068
How about (using non-greedy matching, so that it doesn't find ":word word|word}} word word {{t:word|word"
String s = "word word word {{t:word word|word}} word word {{t:word|word}} word word";
Pattern p = Pattern.compile("\\{\\{t:(.*?)\\}\\}");
Matcher m = p.matcher(s);
while (m.find()) {
//System.out.println(m.group(1));
System.out.println(m.group());
}
Edit:
changed to m.group() so that results contain delimiters.
Upvotes: 3
Reputation: 48226
using the java.util.regex.*
package works miracles here
Pattern p = Pattern.compile("\\{\\{t(.*?)\\}\\}");//escaping + capturing group
Matcher m = p.matcher(str);
Set<String> result = new HashSet<String>();//can also be a list or whatever
while(m.find()){
result.add(m.group(1));
}
the capturing group can also be the entire regex to include the {{
and }}
like so "(\\{\\{t.*?\\}\\})"
Upvotes: 3