madCode
madCode

Reputation: 3913

Extract a particular word from a text

I need to extract the word following 'NN' in this particular sentence?

(ROOT (SBARQ [26.015] (WHNP [1.500] (WP [1.051] What)) (SQ[23.912] (VBZ[2.669]'s)
(NP [19.076] (PRP$ [3.816] your) (NN [9.843] thought))) (. [0.002] ?)))

So, when I parse this.. using Regex, I need to extract only the word 'thought' out.

How do I do that?

My code:

String pattern = "\NN \[[0-9]+(?:\.[0-9])?\] (.)\)"; 
Pattern r = Pattern.compile(pattern); 
Matcher m = r.matcher(st); while(m.find()) {System.out.println(m.group());}

output: (NN [9.843] thought))) (. [0.002] ?)))

But I want only 'thought'

Answer:

Got it :-) thanks people.

String pattern = "NN \\[.*] (\\w+)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(st);
while(m.find())
{System.out.println(m.group(1));}

output: thought

Upvotes: 2

Views: 1037

Answers (2)

Nodebody
Nodebody

Reputation: 1433

Given that the format doesn't allow much kinky stuff, this should get the word:

\(NN \[[^\]]*\] ([^\)]*)\)

and then do s.th. like

if (matcher.find(yourstring)) {
  theword = matcher.group(1);
}

Upvotes: 2

Paul
Paul

Reputation: 4259

The following regular expression will match the NN block, where the (.*) group will pick up 'thought'.

\(NN \[[0-9]+(?:\.[0-9]*)?\] (.*)\)

I always find that regular expression test beds are very useful for this kind of problem. I recommend using: http://www.gskinner.com/RegExr/

Upvotes: 0

Related Questions