Reputation: 3913
I need to extract the word following 'NN' in this particular sentence?
(ROOT (SBARQ [26.015] (WHNP [1.500] (WP [1.051] What)) (SQ[23.912] (VBZ[2.669]'s)
(NP [19.076] (PRP$ [3.816] your) (NN [9.843] thought))) (. [0.002] ?)))
So, when I parse this.. using Regex, I need to extract only the word 'thought' out.
How do I do that?
My code:
String pattern = "\NN \[[0-9]+(?:\.[0-9])?\] (.)\)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(st); while(m.find()) {System.out.println(m.group());}
output: (NN [9.843] thought))) (. [0.002] ?)))
But I want only 'thought'
Answer:
Got it :-) thanks people.
String pattern = "NN \\[.*] (\\w+)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(st);
while(m.find())
{System.out.println(m.group(1));}
output: thought
Upvotes: 2
Views: 1037
Reputation: 1433
Given that the format doesn't allow much kinky stuff, this should get the word:
\(NN \[[^\]]*\] ([^\)]*)\)
and then do s.th. like
if (matcher.find(yourstring)) {
theword = matcher.group(1);
}
Upvotes: 2
Reputation: 4259
The following regular expression will match the NN block, where the (.*) group will pick up 'thought'.
\(NN \[[0-9]+(?:\.[0-9]*)?\] (.*)\)
I always find that regular expression test beds are very useful for this kind of problem. I recommend using: http://www.gskinner.com/RegExr/
Upvotes: 0