Reputation: 11
I have some Ruta script for recognizing peaces of html. See:
DECLARE SkipRegion;
DECLARE EnhancedMarkup(STRING markupName, STRING markupType);
ADDRETAINTYPE(MARKUP);
// replacement of attributes and '/', '<' and '>' symbols is done to get the tag name
MARKUP{-> CREATE(EnhancedMarkup, "markupName" = replaceAll(replaceAll(MARKUP.ct, "\\s[^=]+=[\"'][^\"']+[\"']", ""), "[\\/<>]", ""), "markupType" = "unknown")};
FOREACH(markup) EnhancedMarkup{} {
markup{startsWith(markup.ct, "<") -> SETFEATURE("markupType", "start")};
markup{startsWith(markup.ct, "</") -> SETFEATURE("markupType", "end")};
markup{endsWith(markup.ct, "/>") -> SETFEATURE("markupType", "empty")};
}
// mark the regions not to be looked through
// So whe have a SkipRegion Annotation like '<sub>Zie ook art. 1 van de Werkloosheidswet</sub>'
(start:EnhancedMarkup{start.markupType == "start"} # end:EnhancedMarkup{end.markupType == "end"}) {
start.markupName == end.markupName -> MARK(SkipRegion)
};
SkipRegion{-> UNMARK(MARKUP)};
REMOVERETAINTYPE(MARKUP);
ADDFILTERTYPE(SkipRegion);
When I put the next text into Ruta <sub>Zie ook <nadruk opmaak="cursief">art. 1</nadruk> van de Werkloosheidswet</sub>
I expected that i have the next Annotations:
[
{
"name": "enhancedMarkup",
"type": "Commons.EnhancedMarkup",
"value": "<sub>",
"startPosition": 0,
"endPosition": 5,
"source": "RUTA_ANNOTATION"
},
{
"name": "skipRegion",
"type": "Commons.SkipRegion",
"value": "<nadruk opmaak=\"cursief\">art. 1</nadruk>",
"startPosition": 13,
"endPosition": 53,
"source": "RUTA_ANNOTATION"
},
{
"name": "enhancedMarkup",
"type": "Commons.EnhancedMarkup",
"value": "<nadruk opmaak=\"cursief\">",
"startPosition": 13,
"endPosition": 38,
"source": "RUTA_ANNOTATION"
},
{
"name": "enhancedMarkup",
"type": "Commons.EnhancedMarkup",
"value": "</nadruk>",
"startPosition": 44,
"endPosition": 53,
"source": "RUTA_ANNOTATION"
},
{
"name": "enhancedMarkup",
"type": "Commons.EnhancedMarkup",
"value": "</sub>",
"startPosition": 77,
"endPosition": 83,
"source": "RUTA_ANNOTATION"
},
{
"name": "skipRegion",
"type": "Commons.SkipRegion",
"value": "<sub>Zie ook <nadruk opmaak="cursief">art. 1</nadruk> van de Werkloosheidswet</sub>",
"startPosition": 0,
"endPosition": 83,
"source": "RUTA_ANNOTATION"
}
]
This was working in Ruta v2.7, but it's not working anymore in Ruta v2.8. Then all the Annotation above mentioned are present except the next Annotation is missing:
{
"name": "skipRegion",
"type": "Commons.SkipRegion",
"value": "<sub>Zie ook <nadruk opmaak="cursief">art. 1</nadruk> van de Werkloosheidswet</sub>",
"startPosition": 0,
"endPosition": 83,
"source": "RUTA_ANNOTATION"
}
Upvotes: 1
Views: 37