Reputation: 101
I am creating a TokensRegex annotator to extract the number of floors a building has (just an example to illustrate my question). I have a simple pattern that will recognize both "4 floors" and "four floors" as instances of my custom entity "FLOORS". I would also like to add a NormalizedNER annotation, using the normalized value of the number entity used in the expression, but I can't get it to work the way I want to:
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }
normalized = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NormalizedNamedEntityTagAnnotation" }
tokens = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation" }
ENV.defaults["ruleType"] = "tokens"
{
pattern: ( ( [ { ner:NUMBER } ] ) /floor(s?)/ ),
action: ( Annotate($0, ner, "FLOORS"), Annotate($0, normalized, $$1.text) )
}
The rules above only set the NormalizedNER fields in the output to the text value of the number, "4" and "four" for the above examples respectively. Is there a way to use the NUMBER entity's normalized value ("4.0" both for "4" and "four") as the normalized value for my "FLOORS" entity?
Thanks in advance.
Upvotes: 0
Views: 677
Reputation: 101
The correct answer is based on @AngelChang's answer and comment, I'm just posting it here for the sake of ordeliness.
The rule has to be modified so the 2nd Annotate() action's 3rd parameter is $1[0].normalized
:
{
pattern: ( ( [ { ner:NUMBER } ] ) /floor(s?)/ ),
action: ( Annotate($0, ner, "FLOORS"), Annotate($0, normalized, $1[0].normalized) )
}
According to @Angel's comment:
$1[0].normalized is the "normalized" field of the 0th token of the 1st capture group (as a CoreLabel). The $$1 gives you back the MatchedGroupInfo which has the "text" field but not the normalized field (since that is on the actual token)
Upvotes: 0
Reputation: 161
With $$1.normalized as you suggested, running on the input "The building has seven floors" yields the following error message: Annotating file test.txt { Error extracting annotation from seven floors }
It might be because the NamedEntityTagAnnotation
key is not already present for the token represented by $$1
. I suppose, before running TokensRegex, you'd want to make sure that your numeric tokens - either "four" or "4" in this case - have the corresponding normalized value - "4.0" in this case - set to their NamedEntityTagAnnotation
key.
Also, could you please direct me to where I can find more information on the possible 3rd arguments of Annotate()? Your Javadoc page for TokensRegex expressions doesn't list $$n.normalized (perhaps it needs updating?)
I believe, that what $$n.normalized
would do, would be to retrieve the value which, in Java code, would be the equivalent of coreLabel.get(edu.stanford.nlp.ling.CoreAnnotations$NormalizedNamedEntityTagAnnotation.class)
where coreLabel
is of type CoreLabel
and corresponds with $$n
in TokensRegex.
This is because of the following line in your TokensRegex: normalized = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NormalizedNamedEntityTagAnnotation" }
Upvotes: 0
Reputation: 364
Try changing
action: ( Annotate($0, ner, "FLOORS"), Annotate($0, normalized, $$1.text) )
to
action: ( Annotate($0, ner, "FLOORS"), Annotate($0, normalized, $$1.normalized) )
Annotate takes three arguments
Upvotes: 1