prasanth
prasanth

Reputation: 387

UIMA RUTA TABLES

I'm trying to rename LI and TABLE which is coming from HTML Coversion Like

    Document{-> RETAINTYPE(MARKUP)};
    LI{->MARK(List)}; 
    Document{-> RETAINTYPE};

Its Fine. But When Im Using Same Script for Table Like

   DECLARE TableContent;
   Document{-> RETAINTYPE(MARKUP)};
   TABLE{->MARK(TableContent)};
   Document{-> RETAINTYPE};

Its Not tagged

Input File

<table class="IM-Core-Table TableOverride-1" id="t1" border="1">

<colgroup><col /></colgroup>
<colgroup><col /></colgroup>
<colgroup><col /></colgroup>
<colgroup><col /></colgroup><tbody>
<tr class="IM-Core-Table _idGenTableRowColumn-1">
<td valign="top" style=""><p class="MsoNormal"><aname="para201">ICD-10</a></p>
</td>
<td valign="top" style=""><p class="MsoNormal"><a name="para202">Males</a></p>
</td>
<td valign="top" style=""><p class="MsoNormal"><a name="para203">Females</a></p>
</td>
<td valign="top" style=""><p class="MsoNormal"><a name="para204">Total</a></p>
</td>
</tr>
<tr class="IM-Core-Table _idGenTableRowColumn-1">

Mood disorders (F30-F39)

2

10

12

Neurotic, stress-related and somatoform disorders (F40- F48)

0

5

5

Problems related to social environment (Z60)

0

2

2

</tbody>

</table>

Upvotes: 0

Views: 73

Answers (1)

Peter Kluegl
Peter Kluegl

Reputation: 3113

The problem is that the html contains spaces and lines breaks. By default, the HtmlAnnotator creates an annotation for the content of an html element. This means that, if there is a line break after the opening tag, then the created annotation starts at the offset of the line break. Line breaks like white spaces and markup are not visible by default, and everything that starts with something invisible is also invisible. The simplest solution would be to make them visible temporarily and trim the begin/end of any unwanted/invisible spans, e.g., whitespaces and line breaks.

Here's the script I used for testing this:

TYPESYSTEM utils.HtmlTypeSystem;
ENGINE utils.HtmlAnnotator;
EXEC(HtmlAnnotator, {TAG});

DECLARE TableContent;
RETAINTYPE(MARKUP, WS);
TABLE{-> TRIM(WS)};
TABLE{-> TableContent};
RETAINTYPE;

When I work with the HtmlAnnotator, I often do something like:

RETAINTYPE(MARKUP, WS);
TAG{-> TRIM(MARKUP, WS)};
RETAINTYPE;

DISCLAIMER: I am a developer of UIMA Ruta

Upvotes: 0

Related Questions