Sourav Kannantha B
Sourav Kannantha B

Reputation: 3299

Rasa `RegexEntityExtractor` extracting non entities as entities

I was trying to extract user names from input. This is the related training data (I have not provided the entire data since that would bloat the question):

nlu:

- intent: greet
  examples: |
    - hey
    - hello
    - hi
    - hello there

- intent: en_name
  examples: |
    - my name is [adarsh](name)
    - name is [suma](name)
    - I am [sourav](name)

- regex: name
  examples: |
    - \w{2,40}
stories:

- story: Start conversation
  - intent: greet
  - action: utter_greet
  - action: utter_help
  - intent: seek_scheme
  - action: user_info_form
  - active_loop: user_info_form
rules:

- rule: activate form
  steps:
    - intent: seek_scheme
    - action : user_info_form
    - active_loop: user_info_form

- rule : submit form
  condition:
    - active_loop: user_info_form
  steps:
    - action : user_info_form
    - active_loop: null
    - slot_was_set:
        - requested_slot : null
    - action: utter_submit
    - action: utter_slot_values

- rule : Deviated form path
  condition:
    - active_loop: user_info_form
  steps:
    - intent: bot_challenge
    - action: utter_iamabot
    - action : user_info_form
    - active_loop: user_info_form

Here, since I was using regex, I had to add RegexEntityExtractor for my pipeline. But after adding it, even "hi" as input is being treated as entity name. This is a sample from rasa interactive session:

? Your input -> hi
? Is the intent 'greet' correct for '[hi](name)' and are all entities labeled correctly? No
? What intent is it? 1.00 greet
? Please mark the entities using [value](type) notation hi

Like this, for any word I provide, it is tagging it to entity name. I guess this is because, RegexEntityExtractor doesn't account for intents when extracting entity. How to fix this?

Upvotes: 1

Views: 1903

Answers (1)

matthiasleimeister
matthiasleimeister

Reputation: 111

I think by definition of your name regex as \w{2,40} any word between 2 and 40 characters will be matched and labelled as an entity.

According to the documentation, the RegexEntityExtractor will use the defined regexes and lookup tables. So if you have a list of names you want to extract, you can add them in a lookup table.

If you want to learn the entities from the training data, you can use trainable extractors like DIETClassifier or CRFEntityExtractor, or pre-trained models that extract person names such as SpacyEntityExtractor.

You can also check out the Rasa Community Forum for related threads and suggested pipeline configs.

Upvotes: 1

Related Questions