prasanth
prasanth

Reputation: 387

Looping issue - UIMA Ruta

Objective:

To assign heading levels.

First heading is assigned level 1. I extract font family and size of it and find matching headings. Once the level gets assigned, I unmark the heading, preserving the headings & the features in yet another annotation (HeadingHierarchy). Once the level is finished, I call the same block again and again as long as there is any heading left in Headinglevel annotation.

Issue:

The script works fine for finding all level-1 headings. But when the block is executed via Call statement, it finds only the first match for each levels (level 2 onwards). Hence the total number of levels for the below input becomes 10, whereas it has to be 4.

Input:(.txt)

Apache UIMA Ruta Overview =>Arial,18
What is Apache UIMA Ruta? =>Arial,16
Getting started =>Arial,16
UIMA Analysis Engines =>Arial,16
Ruta Engine =>Times New Roman,14
Configuration Parameters =>Arial,10
Annotation Writer =>Times New Roman,14
Configuration Parameters =>Arial,10
Apache UIMA Ruta Language =>Arial,18
Syntax =>Arial,16
Rule elements and their matching order =>Arial,16

Script:

PACKAGE uima.ruta.example;

DECLARE Headinglevel(STRING family, INT size, INT level);
DECLARE HeadingHierarchy(STRING family, INT size, INT level);
DECLARE FontFamily, FontSize;

STRING family;
INT size;

RETAINTYPE(BREAK);
    BREAK? #{-PARTOF(Headinglevel)} @SPECIAL+ W+ COMMA NUM{->MARK(Headinglevel,2,6), MARK(HeadingHierarchy,2,6), MARK(FontFamily,4), MARK(FontSize,6)};
RETAINTYPE;

h:Headinglevel{->h.family = family, HeadingHierarchy.family = family}
<-{FontFamily{PARSE(family)};};

h:Headinglevel{->h.size = size, HeadingHierarchy.size = size}
<-{FontSize{PARSE(size)};};

INT i=1;

BLOCK(ForEachHeadLevel)Document{}
{
    # h:Headinglevel{-> family = h.family, size = h.size};
    h:Headinglevel{AND(h.family == family, h.size == size)-> h.level=i, HeadingHierarchy.level = i, UNMARK(h)};
}
Headinglevel{->i=i+1, CALL(Test2.ForEachHeadLevel)};
Document{->LOG("    LEVELS : " + (i))};

Expected Output:

   HeadingHierarchy                        Feature

Apache UIMA... =>Arial,18                  level: 1
What is Apa... =>Arial,16                  level: 2
Getting sta... =>Arial,16                  level: 2
UIMA Analys... =>Arial,16                  level: 2
Ruta Engine... =>Times New Roman,14        level: 3
Configurati... =>Arial,10                  level: 4
Annotation ... =>Times New Roman,14        level: 3
Configurati... =>Arial,10                  level: 4
Apache UIMA... =>Arial,18                  level: 1
Syntax =>Ar... =>Arial,16                  level: 2
Rule elemen... =>Arial,16                  level: 2

Upvotes: 1

Views: 150

Answers (1)

Peter Kluegl
Peter Kluegl

Reputation: 3113

The problem is that the CALL restricts the window on the span matched by the rule element. This means that the BLOCK is only executed within an existing Headinglevel annotation. However, you need to have the complete document so that the second rule in the block does its job.

This is most likely not the best solution, but the first one that came to my mind.

You could reset the window within the BLOCK to the complete document regardless of the restriction of the CALL action with DOCUMENTBLOCK:

BLOCK (ForEachHeadLevel)Document{}
{
    DOCUMENTBLOCK Document{} 
    {
        # h:Headinglevel{-> family = h.family, size = h.size};
        h:Headinglevel{AND(h.family == family, h.size == size)-> h.level=i, HeadingHierarchy.level = i, UNMARK(h)};
    }
}

DOCUMENTBLOCK is a block extension. You need to include org.apache.uima.ruta.block.DocumentBlockExtension in the additionalExtensions configuration parameter.

Here's another solution using a FOREACH block:

INT i=0;
FOREACH(hl) Headinglevel{}{
    hl{IS(Headinglevel)-> i=i+1, family = hl.family, size = hl.size};
    h:Headinglevel{h.family == family, h.size == size ->  h.level=i, HeadingHierarchy.level = i, UNMARK(h)};
}

DISCLAIMER: I am a developer of UIMA Ruta

Upvotes: 1

Related Questions