Difficulty using JAPE Grammar

Question

I have a document which contains sections such as Assessments, HPI, ROS, Vitals etc. I want to extract notes in each section. I am using GATE for this purpose. I have made a JAPE file which will extract notes in the Assessment section. Following is the grammar,

Input: Token
Options: control=appelt debug=true

Rule: Assess
({Token.string =~"(?i)diagnose[d]?"}{Token.string=="with"} | {Token.string=~"(?i)suffering"}{Token.string=~"(?i)from"} | {Token.string=~"(?i)suffering"}{Token.string=~"(?i)with"})

(
({Token})*
):assessments

({Token.string =~"(?i)HPI"} | {Token.string =~"(?i)ROS"} | {Token.string =~"(?i)EXAM"} | {Token.string =~"(?i)VITAL[S]"} | {Token.string =~"(?i)TREATMENT[s]"} |{Token.string=~"(?i)use[d]?"}{Token.string=~"(?i)orderset[s]?"} | {Token.string=~"$"})


-->
:assessments.Assessments = {}

Now, when the assessment section is in the end of the document I can retrieve the notes properly. But if it is somewhere between two sections then this will return entire document from assessment section till the end of file.

I have tried using {Token.string=~"$"} in different ways but could not extract ONLY THE ASSESSMENT SECTION IRRESPECTIVE OF ITS PLACE IN THE DOC.

Please explain how can I achieve this using JAPE grammar.

Ian Roberts · Accepted Answer

That is correct since Appelt mode always prefers the longest possible overall match. Since any Token can match string =~ "$" the assessments label will grab all but the final token in the document.

I would adopt a two pass approach, using an initial gazetteer or JAPE phase to annotate the "section headings" and then another phase with only these heading annotations in its input line

Imports: { import static gate.Utils.*; }
Phase: AnnotateBetweenHeadings
Input: Heading
Options: control = appelt

Rule: TwoHeadings
({Heading.type ="assessments"}):h1
(({Heading})?):h2
-->
{
  Long endOffset = end(doc);
  AnnotationSet h2Annots = bindings.get("h2");
  if(h2Annots != null && !h2Annots.isEmpty()) {
    endOffset = start(h2Annots);
  }
  outputAS.add(end(bindings.get("h1")), endOffset, "Assessments", featureMap());
}

This will annotate everything between the end of the assessments heading and the start of the following heading, or the end of the document if there is no following heading.

Difficulty using JAPE Grammar

Answers (2)

Related Questions