Cesare Contini
Cesare Contini

Reputation: 11

How can I exract a full sentence using Apache NLPCraft?

In my model file I am using a macro with a regex extract any space-separated alpha-numeric words to capture an user-input sentence i.e.

macros:
  - name: "<GENERIC_INPUT>"
    macro: "{//[a-zA-Z0-9 ]+//}"

Then I am trying to capture it as following in the element:

elements:
  - id: "prop:title"
    description: Set title
    synonyms:
      - "{set|add} title <GENERIC_INPUT>"

The intent term is as following:

intents:
 - "intent=myIntent term(createStory)~{tok_id() == 'prop:createStory'} term(title)~{tok_id() == 'prop:title'}?"

In the Java Model I am correctly capturing the title property:

public NCResult onMatch(
            NCIntentMatch ctx,
            @NCIntentTerm("createStory") NCToken createStory,
            @NCIntentTerm("title") Optional<NCToken> titleList,
{
...

When I run a query against the REST API service the probe is deployed in, I only get the first word of the last element <GENERIC_INPUT> (the regular expression) of the synonym defined as {set|add} title <GENERIC_INPUT> i.e.

HTTP 200 [235ms]
{
  "status": "API_OK",
  "state": {
    "resType": "json",
    "mdlId": "Create Story",
    "txt": "set title this is my story",
    "resMeta": {},
    "srvReqId": "GKDY-QLBM-B6TQ-7KYO-KMR8",
    "status": "QRY_READY",
    "resBody": {
      "title": "set title this",
      "createStory": true,
    },
    "usrId": 1,
    "intentId": "myIntent"
  }
}

In the resBody.title I get set title this rather than the whole string as it should be allowed by the regex i.e. set title this is my story

Any idea why? How can I get it to extract the whole title?

Many thanks

Upvotes: 0

Views: 97

Answers (2)

Sergey K
Sergey K

Reputation: 51

do you know if the apache nlpcraft provides a built-in method to extract as >>well quoted sentences i.e. 'some sentence like this one'?

There are few workarounds for such request, some of the seem like hacks. I guess that most straight solution is following:

  1. Make NCCustomParser
public class QuotedSentenceParser implements NCCustomParser {
    @Override
    public List<NCCustomElement> parse(NCRequest req, NCModelView mdl, List<NCCustomWord> words, List<NCCustomElement> elements) {
        String txt = req.getNormalizedText();

        if (
            txt.charAt(0) == '\'' &&
            txt.charAt(txt.length() - 1) == '\'' &&
            !txt.substring(1, txt.length() - 1).contains("'")
        )
            return words.stream().map(
                w -> new NCCustomElement() {
                    @Override
                    public String getElementId() {
                        return "qElem";
                    }

                    @Override
                    public List<NCCustomWord> getWords() {
                        return Collections.singletonList(w);
                    }

                    @Override
                    public Map<String, Object> getMetadata() {
                        return Collections.emptyMap();
                    }
                }
            ).collect(Collectors.toList());

        return null;
    }
}
  1. add configuration (Note, that you have to add qElem dummy element here.. It seems like some bug or unclear feature, I am pretty sure that dynamic definition of this element ID in QuotedSentenceParser must be enough)
    elements:
      - id: "qElem"
        description: "Set title"
        synonyms:
          - "-"
    
    intents:
      - "intent=test term(qElem)={# == 'qElem'}*"
    
    parsers:
      - "org.apache.nlpcraft.examples.lightswitch.QuotedSentenceParser"
  1. Usage
@NCIntentRef("test")
@NCIntentSample({
    "'Set title a b c'"
})
NCResult x(NCIntentMatch ctx, @NCIntentTerm("qElem") List<NCToken> qElems) {
    System.out.println(qElems.stream().map(p -> p.getNormalizedText()).collect(Collectors.joining("|")));

    return NCResult.text("OK");
}

Upvotes: 0

Sergey K
Sergey K

Reputation: 51

Regex <GENERIC_INPUT> can catch individual token, but not group of tokens.

Please try such way

elements:

  - id: "prop:title"
    description: "Set title"
    synonyms:
      - "{set|add} title"

  - id: "prop:any"
    description: "Set any"
    synonyms:
      - "//[a-zA-Z0-9 ]+//"

intents:
  - "intent=test term(title)={# == 'prop:title'} term(any)={# == 'prop:any'}*"

Callback

@NCIntentRef("test")
@NCIntentSample({
    "Set title 1 2",
    "Set title a b c"
})
NCResult x(
    NCIntentMatch ctx,
    @NCIntentTerm("title") NCToken title,
    @NCIntentTerm("any") List<NCToken> any) {

    System.out.println("title=" + title.getNormalizedText());
    System.out.println("any=" + any.stream().map(NCToken::getNormalizedText).collect(Collectors.joining("|")));

    return NCResult.text("OK");
}

It should work.

But also please try to drop regex here. It can work too slow and you will have many garbage variants.

You can use one element in intent and extract following words in the callback

Model:

elements:

  - id: "prop:title"
    description: "Set title"
    synonyms:
      - "{set|add} title"

intents:
  - "intent=test term(title)={# == 'prop:title'}"

Callback:

@NCIntentRef("test")
@NCIntentSample({
    "Set title 1 2",
    "Set title a b c"
})
NCResult x(
    NCIntentMatch ctx,
    @NCIntentTerm("title") NCToken title) {

    System.out.println("title=" + title.getNormalizedText());

    System.out.println("any after=" +
            Stream.concat(
                ctx.getVariant().getFreeTokens().stream(),
                ctx.getVariant().getStopWordTokens().stream()
            ).sorted(Comparator.comparingInt(NCToken::getStartCharIndex)).
            filter(p -> p.getStartCharIndex() > title.getStartCharIndex()).
            map(NCToken::getNormalizedText).
            collect(Collectors.joining("|"))
    );

    return NCResult.text("OK");
}

Same result, but without regex.

Upvotes: 0

Related Questions