Reputation: 91
Sample script:
DocumentAnnotation{-> RETAINTYPE(SPACE,BREAK)};
((NUM (SPECIAL NUM)?)|EntityType{FEATURE("entityType", "amount")})
(COMMA|SPACE|BREAK)*
((W|NUM) (SPACE | PERIOD)?)*
(COMMA|SPACE|BREAK)*
(((W|NUM) (SPACE | PERIOD)?)*(COMMA|SPACE|BREAK)*)
((EntityType+{FEATURE("entityType", "location_indicator")} | (NUM{REGEXP(".....")}
("-" NUM{REGEXP("....")})?))
(COMMA|SPACE|BREAK)*)+
{-> MARK(EntityType,1,8)};
NUM+
// 123-1
(SPECIAL NUM+)?
SPACE*
// Street lane road
((W|NUM) (SPACE | PERIOD)?)*
(COMMA|SPACE)*
// City
(W SPACE?)+
(COMMA|SPACE)*
// state
(W SPACE?)+
(COMMA|SPACE)*
// pincode
NUM
(COMMA|SPACE)*
W?{REGEXP("(?i)(USA|US|CANADA)") ->MARK(EntityType,1,2,3,4,5,6,7,8,9,10,11,12)};
We are getting OOM issues randomly but it happens only in prod environment we are not able to reproduce locally. Any clue if script is the real problem. Below is the stack trace from thread dump apart from this we don't have access to actual text which caused this
"EMAIL-Thread-1105" Id=37590 in RUNNABLE
BlockedCount : 328, BlockedTime : -1, WaitedCount : 48354, WaitedTime : -1
at org.apache.uima.ruta.rule.ComposedRuleElementMatch.enforceUpdate(ComposedRuleElementMatch.java:57)
at org.apache.uima.ruta.rule.ComposedRuleElementMatch.enforceUpdate(ComposedRuleElementMatch.java:57)
at org.apache.uima.ruta.rule.ComposedRuleElementMatch.setInnerMatches(ComposedRuleElementMatch.java:63)
at org.apache.uima.ruta.rule.ComposedRuleElementMatch.copy(ComposedRuleElementMatch.java:138)
at org.apache.uima.ruta.rule.ComposedRuleElementMatch.copy(ComposedRuleElementMatch.java:35)
at org.apache.uima.ruta.rule.ComposedRuleElementMatch.copy(ComposedRuleElementMatch.java:131)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueMatch(ComposedRuleElement.java:208)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueOwnMatch(ComposedRuleElement.java:370)
at org.apache.uima.ruta.rule.ComposedRuleElement.fallbackContinue(ComposedRuleElement.java:474)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueMatch(ComposedRuleElement.java:233)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueOwnMatch(ComposedRuleElement.java:370)
at org.apache.uima.ruta.rule.ComposedRuleElement.fallbackContinue(ComposedRuleElement.java:474)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueMatch(ComposedRuleElement.java:233)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueOwnMatch(ComposedRuleElement.java:370)
at org.apache.uima.ruta.rule.ComposedRuleElement.fallbackContinue(ComposedRuleElement.java:474)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueMatch(ComposedRuleElement.java:233)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueOwnMatch(ComposedRuleElement.java:370)
at org.apache.uima.ruta.rule.ComposedRuleElement.fallbackContinue(ComposedRuleElement.java:474)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueMatch(ComposedRuleElement.java:233)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueOwnMatch(ComposedRuleElement.java:370)
at org.apache.uima.ruta.rule.ComposedRuleElement.fallbackContinue(ComposedRuleElement.java:474)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueMatch(ComposedRuleElement.java:233)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueOwnMatch(ComposedRuleElement.java:370)
at org.apache.uima.ruta.rule.ComposedRuleElement.fallbackContinue(ComposedRuleElement.java:474)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueMatch(ComposedRuleElement.java:233)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueOwnMatch(ComposedRuleElement.java:370)
at org.apache.uima.ruta.rule.ComposedRuleElement.fallbackContinue(ComposedRuleElement.java:474)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueMatch(ComposedRuleElement.java:233)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueOwnMatch(ComposedRuleElement.java:370)
at org.apache.uima.ruta.rule.ComposedRuleElement.fallbackContinue(ComposedRuleElement.java:474)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueMatch(ComposedRuleElement.java:233)
at org.apache.uima.ruta.rule.ComposedRuleElement.continueOwnMatch(ComposedRuleElement.java:370)
at org.apache.uima.ruta.rule.ComposedRuleElement.fallbackContinue(ComposedRuleElement.java:474)
at
Upvotes: 1
Views: 92
Reputation: 3113
It looks like your OOM is caused by an endless loop in which helper annotations are created. These annotations probably consume the memory. This loop could be caused by a special combination of characters or annotation which do not occur in your dev setting. The OOM could also be caused by the large rule containing many disjunctive rule elements.
In the latest release, several of these problems have already been fixed. I highly recommend upgrading to Ruta 3.2.0. Some of them are also fixed in the main-v2 branch.
An OOM can be worst case for an application and an exception may be preferred. You could restrict the number of allowed matches of rules and rule element with the parameters PARAM_MAX_RULE_MATCHES and PARAM_MAX_RULE_ELEMENT_MATCHES. This can help to ensure the stability of your application, but it is not a solution.
If updating the Ruta version is not an option, you can avoid the problem by refactoring your rules. I would recommend removing the disjunctive rule elements and split the rules into several smaller rules. Then, you can also avoid the stacked quantifiers. I would also recommend separating the whitespace sensitive parts from the whitespace insensitive parts.
Upvotes: 0