Multi-intent natural language processing and classification

Question

So, I'm making my own home assistant and I'm trying to make a multi-intent classification system. However, I cannot find a way to split the query said by the user into the multiple different intents in the query.

For example:

I have my data for one of my intents (same format for all) 

{"intent_name": "music.off" , "examples": ["turn off the music" , "kill 
the music" , "cut the music"]}

and the query said by the user would be:

'dim the lights, cut the music and play Black Mirror on tv'

I want to split the sentence into their individual intents such as :

['dim the lights', 'cut the music', 'play black mirror on tv']

however, I can't just use re.split on the sentence with and and , as delimiters to split with as if the user asks :

'turn the lights off in the living room, dining room, kitchen and bedroom'

this will be split into

['turn the lights off in the living room', 'kitchen', 'dining room', 'bedroom']

which would not be usable with my intent detection

this is my problem, thank you in advance

UPDATE

okay so I've got this far with my code, it can get the examples from my data and identify the different intents inside as I wished however it is not splitting the parts of the original query into their individual intents and is just matching.

import nltk
import spacy
import os
import json
#import difflib
#import substring
#import re
#from fuzzysearch import find_near_matches
#from fuzzywuzzy import process

text = "dim the lights, shut down the music and play White Collar"

commands = []

def get_matches():

    for root, dirs, files in os.walk("./data"):  

        for filename in files:

            f = open(f"./data/{filename}" , "r")
            file_ = f.read()
            data = json.loads(file_)

            choices.append(data["examples"])

        for set_ in choices:

            command = process.extract(text, set_ , limit=1)

            commands.append(command)

    print(f"all commands : {commands}")

this returns [('dim the lights') , ('turn off the music') , ('play Black Mirror')] which is the correct intents but I have no way of knowing which part of the query relates to each intent - this is the main problem

my data is as follows , very simple for now until I figure out a method:

play.json

{"intent_name": "play.device" , "examples" : ["play Black Mirror" , "play Netflix on tv" , "can you please stream Stranger Things"]}

music.json

{"intent_name": "music.off" , "examples": ["turn off the music" , "cut the music" , "kill the music"]}

lights.json

{"intent_name": "lights.dim" , "examples" : ["dim the lights" , "turn down the lights" , "lower the brightness"]}

David Dale · Accepted Answer

It seems that you are mixing two problems in your questions:

Multiple independent intents within a single query (e.g. shut down the music and play White Collar)
Multiple slots (using the form-filling framework) within a single intent (e.g. turn the lights off in the living room bedroom and kitchen).

These problems are quite different. Both, however, can be formulated as word tagging problem (similar to POS-tagging) and solved with machine learning (e.g. CRF or bi-LSTM over pretrained word embeddings, predicting label for each word).

The intent labels for each word can be created using BIO notation, e.g.

shut   B-music_off
down   I-music_off
the    I-music_off
music  I-music_off
and    O
play   B-tv_on
White  I-tv_on
Collar I-tv_on

turn    B-light_off
the     I-light-off
lights  I-light-off 
off     I-light-off
in      I-light-off
the     I-light-off
living  I-light-off
room    I-light-off
bedroom I-light-off
and     I-light-off
kitchen I-light-off

The model would read the sentence and predict the labels. It should be trained on at least hundreds of examples - you have to generate or mine them.

After splitting intents with model trained on such labels, you will have short texts corresponding to a unique intent each. Then for each short text you need to run the second segmentation, looking for slots. E.g. the sentence about the light can be presented as

turn    B-action
the     I-action
lights  I-action
off     I-action
in      O
the     B-place
living  I-place
room    I-place
bedroom B-place
and     O
kitchen B-place

Now the BIO markup hepls much: the B-place tag separates bedroom from the living room.

Both segmentations can in principle be performed by one hierarchical end-to-end model (google semantic parsing if you want it), but I feel that two simpler taggers can work as well.

Multi-intent natural language processing and classification

UPDATE

Answers (1)

Related Questions