Sandipan Dey
Sandipan Dey

Reputation: 23101

Training Watson Discovery with (V1) python SDK APIs does not work

I want to use Watson discovery V1 APIs for relevancy training. I tried the following but yet to get desired result. Describing the problem in details below:

I have a set of documents some of them contain the word 'cloud' or 'big data'. I want to search for the word 'hadoop' with the query() api and get back those documents, but discovery query returns nothing.

Now, I want to provide the following training examples to discovery to update the relevance scores so that I get those results back (I used query expansion for the same task and it worked, now i am interested in relevancy training).

I have used the api add_training_data() to associate the query 'hadoop' with the relevant documents (specified by ids, the documents that contain 'cloud', e.g.).

Now the training data looks like the following:

{
  "natural_language_query": "hadoop",
  "filter": "",
  "examples": [
    {
      "document_id": "1ad6f551-e092-4ce9-b08c-eb4f4cbc9458",
      "cross_reference": "",
      "relevance": 1,
      "created": "2020-01-30T23:16:19.674Z",
      "updated": "2020-01-30T23:16:19.716Z"
    },
    {
      "document_id": "f1d11f51-31b2-414f-b359-d5336b019575",
      "cross_reference": "",
      "relevance": 1,
      "created": "2020-01-30T23:16:19.674Z",
      "updated": "2020-01-30T23:16:19.722Z"
    },
    {
      "document_id": "5bfcea6a-c925-4db5-a490-89a9d1de8d4c",
      "cross_reference": "",
      "relevance": 1,
      "created": "2020-01-30T23:16:19.674Z",
      "updated": "2020-01-30T23:16:19.729Z"
    },
    {
      "document_id": "bf07e701-6893-428c-ab16-c5446e821291",
      "cross_reference": "",
      "relevance": 1,
      "created": "2020-01-30T23:16:19.674Z",
      "updated": "2020-01-30T23:16:19.735Z"
    },
    {
      "document_id": "75082812-5c96-4d2e-b388-821a0434ad4c",
      "cross_reference": "",
      "relevance": 1,
      "created": "2020-01-30T23:16:19.674Z",
      "updated": "2020-01-30T23:16:19.742Z"
    }
  ],
  "query_id": "cc1d3677eeafe70929aeccfb462860439f61b051",
  "created": "2020-01-30T23:16:19.677Z",
  "updated": "2020-01-30T23:16:19.677Z"
}

where the document ids correspond to the documents in the collection, the ones that contain the word 'cloud'. e.g.

With the training data created, now i wanted to run the earlier query again with the query text 'hadoop', with the assumption that discovery would automatically train itself to get the relevant results back (since I could not find any api like 'train()' that i was expecting). But, even after providing the training examples, discovery query still returns nothing.

I don't have any clue what's going wrong. Some help will be really appreciated.

Upvotes: 0

Views: 199

Answers (1)

Crispim
Crispim

Reputation: 11

Sandipan,

As mentioned here: Improving result relevance with the API

When you provide a Discovery instance with training data, the service uses machine-learning Watson techniques to find signals in your content and questions. The service then reorders query results to display the most relevant results at the top. As you add more training data, the service instance becomes more accurate and sophisticated in the ordering of results it returns.

I believe that the relevance training wont work the way you need. It only reorders.

Also, "The collection's training-data set must contain at least 49 unique training queries (that is, sets of queries and examples).". You need add more queries for training to start.

Upvotes: 1

Related Questions