Rob van Dalen
Rob van Dalen

Reputation: 11

Can't create a Ruta Annotation after upgrade to Ruta 2.8.0

I have some Ruta script for recognizing peaces of html. See:

DECLARE SkipRegion;
DECLARE EnhancedMarkup(STRING markupName, STRING markupType);

ADDRETAINTYPE(MARKUP);

// replacement of attributes and '/', '<' and '>' symbols is done to get the tag name
MARKUP{-> CREATE(EnhancedMarkup, "markupName" = replaceAll(replaceAll(MARKUP.ct, "\\s[^=]+=[\"'][^\"']+[\"']", ""), "[\\/<>]", ""), "markupType" = "unknown")};
FOREACH(markup) EnhancedMarkup{} {
    markup{startsWith(markup.ct, "<") -> SETFEATURE("markupType", "start")};
    markup{startsWith(markup.ct, "</") -> SETFEATURE("markupType", "end")};
    markup{endsWith(markup.ct, "/>") -> SETFEATURE("markupType", "empty")};
}

// mark the regions not to be looked through
// So whe have a SkipRegion Annotation like '<sub>Zie ook art. 1 van de Werkloosheidswet</sub>'
(start:EnhancedMarkup{start.markupType == "start"} # end:EnhancedMarkup{end.markupType == "end"}) {
    start.markupName == end.markupName -> MARK(SkipRegion)
};
SkipRegion{-> UNMARK(MARKUP)};

REMOVERETAINTYPE(MARKUP);
ADDFILTERTYPE(SkipRegion);

When I put the next text into Ruta <sub>Zie ook <nadruk opmaak="cursief">art. 1</nadruk> van de Werkloosheidswet</sub>

I expected that i have the next Annotations:

[
  {
    "name": "enhancedMarkup",
    "type": "Commons.EnhancedMarkup",
    "value": "<sub>",
    "startPosition": 0,
    "endPosition": 5,
    "source": "RUTA_ANNOTATION"
  },
  {
    "name": "skipRegion",
    "type": "Commons.SkipRegion",
    "value": "<nadruk opmaak=\"cursief\">art. 1</nadruk>",
    "startPosition": 13,
    "endPosition": 53,
    "source": "RUTA_ANNOTATION"
  },
  {
    "name": "enhancedMarkup",
    "type": "Commons.EnhancedMarkup",
    "value": "<nadruk opmaak=\"cursief\">",
    "startPosition": 13,
    "endPosition": 38,
    "source": "RUTA_ANNOTATION"
  },
  {
    "name": "enhancedMarkup",
    "type": "Commons.EnhancedMarkup",
    "value": "</nadruk>",
    "startPosition": 44,
    "endPosition": 53,
    "source": "RUTA_ANNOTATION"
  },
  {
    "name": "enhancedMarkup",
    "type": "Commons.EnhancedMarkup",
    "value": "</sub>",
    "startPosition": 77,
    "endPosition": 83,
    "source": "RUTA_ANNOTATION"
  },
  {
    "name": "skipRegion",
    "type": "Commons.SkipRegion",
    "value": "<sub>Zie ook <nadruk opmaak="cursief">art. 1</nadruk> van de Werkloosheidswet</sub>",
    "startPosition": 0,
    "endPosition": 83,
    "source": "RUTA_ANNOTATION"
  }
]

This was working in Ruta v2.7, but it's not working anymore in Ruta v2.8. Then all the Annotation above mentioned are present except the next Annotation is missing:

{
  "name": "skipRegion",
  "type": "Commons.SkipRegion",
  "value": "<sub>Zie ook <nadruk opmaak="cursief">art. 1</nadruk> van de Werkloosheidswet</sub>",
  "startPosition": 0,
  "endPosition": 83,
  "source": "RUTA_ANNOTATION"
}

Upvotes: 1

Views: 37

Answers (0)

Related Questions