Reputation: 11
In my model file I am using a macro with a regex extract any space-separated alpha-numeric words to capture an user-input sentence i.e.
macros:
- name: "<GENERIC_INPUT>"
macro: "{//[a-zA-Z0-9 ]+//}"
Then I am trying to capture it as following in the element:
elements:
- id: "prop:title"
description: Set title
synonyms:
- "{set|add} title <GENERIC_INPUT>"
The intent term is as following:
intents:
- "intent=myIntent term(createStory)~{tok_id() == 'prop:createStory'} term(title)~{tok_id() == 'prop:title'}?"
In the Java Model I am correctly capturing the title
property:
public NCResult onMatch(
NCIntentMatch ctx,
@NCIntentTerm("createStory") NCToken createStory,
@NCIntentTerm("title") Optional<NCToken> titleList,
{
...
When I run a query against the REST API service the probe is deployed in, I only get the first word of the last element <GENERIC_INPUT> (the regular expression) of the synonym defined as {set|add} title <GENERIC_INPUT>
i.e.
HTTP 200 [235ms]
{
"status": "API_OK",
"state": {
"resType": "json",
"mdlId": "Create Story",
"txt": "set title this is my story",
"resMeta": {},
"srvReqId": "GKDY-QLBM-B6TQ-7KYO-KMR8",
"status": "QRY_READY",
"resBody": {
"title": "set title this",
"createStory": true,
},
"usrId": 1,
"intentId": "myIntent"
}
}
In the resBody.title
I get set title this
rather than the whole string as it should be allowed by the regex i.e. set title this is my story
Any idea why? How can I get it to extract the whole title?
Many thanks
Upvotes: 0
Views: 97
Reputation: 51
do you know if the apache nlpcraft provides a built-in method to extract as >>well quoted sentences i.e. 'some sentence like this one'?
There are few workarounds for such request, some of the seem like hacks. I guess that most straight solution is following:
public class QuotedSentenceParser implements NCCustomParser {
@Override
public List<NCCustomElement> parse(NCRequest req, NCModelView mdl, List<NCCustomWord> words, List<NCCustomElement> elements) {
String txt = req.getNormalizedText();
if (
txt.charAt(0) == '\'' &&
txt.charAt(txt.length() - 1) == '\'' &&
!txt.substring(1, txt.length() - 1).contains("'")
)
return words.stream().map(
w -> new NCCustomElement() {
@Override
public String getElementId() {
return "qElem";
}
@Override
public List<NCCustomWord> getWords() {
return Collections.singletonList(w);
}
@Override
public Map<String, Object> getMetadata() {
return Collections.emptyMap();
}
}
).collect(Collectors.toList());
return null;
}
}
qElem
dummy element here.. It seems like some bug or unclear feature, I am pretty sure that dynamic definition of this element ID in QuotedSentenceParser must be enough) elements:
- id: "qElem"
description: "Set title"
synonyms:
- "-"
intents:
- "intent=test term(qElem)={# == 'qElem'}*"
parsers:
- "org.apache.nlpcraft.examples.lightswitch.QuotedSentenceParser"
@NCIntentRef("test")
@NCIntentSample({
"'Set title a b c'"
})
NCResult x(NCIntentMatch ctx, @NCIntentTerm("qElem") List<NCToken> qElems) {
System.out.println(qElems.stream().map(p -> p.getNormalizedText()).collect(Collectors.joining("|")));
return NCResult.text("OK");
}
Upvotes: 0
Reputation: 51
Regex <GENERIC_INPUT> can catch individual token, but not group of tokens.
Please try such way
elements:
- id: "prop:title"
description: "Set title"
synonyms:
- "{set|add} title"
- id: "prop:any"
description: "Set any"
synonyms:
- "//[a-zA-Z0-9 ]+//"
intents:
- "intent=test term(title)={# == 'prop:title'} term(any)={# == 'prop:any'}*"
Callback
@NCIntentRef("test")
@NCIntentSample({
"Set title 1 2",
"Set title a b c"
})
NCResult x(
NCIntentMatch ctx,
@NCIntentTerm("title") NCToken title,
@NCIntentTerm("any") List<NCToken> any) {
System.out.println("title=" + title.getNormalizedText());
System.out.println("any=" + any.stream().map(NCToken::getNormalizedText).collect(Collectors.joining("|")));
return NCResult.text("OK");
}
It should work.
But also please try to drop regex here. It can work too slow and you will have many garbage variants.
You can use one element in intent and extract following words in the callback
Model:
elements:
- id: "prop:title"
description: "Set title"
synonyms:
- "{set|add} title"
intents:
- "intent=test term(title)={# == 'prop:title'}"
Callback:
@NCIntentRef("test")
@NCIntentSample({
"Set title 1 2",
"Set title a b c"
})
NCResult x(
NCIntentMatch ctx,
@NCIntentTerm("title") NCToken title) {
System.out.println("title=" + title.getNormalizedText());
System.out.println("any after=" +
Stream.concat(
ctx.getVariant().getFreeTokens().stream(),
ctx.getVariant().getStopWordTokens().stream()
).sorted(Comparator.comparingInt(NCToken::getStartCharIndex)).
filter(p -> p.getStartCharIndex() > title.getStartCharIndex()).
map(NCToken::getNormalizedText).
collect(Collectors.joining("|"))
);
return NCResult.text("OK");
}
Same result, but without regex.
Upvotes: 0