SBB
SBB

Reputation: 8970

Query records from aws kendra index using sdk

I am testing out AWS Kendra for a business use-case and I am having trouble figuring out how to query data in the index to ensure data accuracy.

The connection where the data is coming from uses our Salesforce instance which contains over 1,000 knowledge articles.

The syncing of data appears to be working and we can see that the document count is 384.

enter image description here

Now, because we have over 1,000 possible articles, we have restricted our API user that is connecting Kendra to Salesforce to only be able to access specific articles.

Before we move forward, we want to ensure that the articles indexed are what we expect and have allowed the API user to bring over.

What I am now trying to do is audit / export the records that are in the index so I can compare them to the records we expect to see from the source.

For this, I am using the javascript SDK @aws-sdk/client-kendra.

I wrote a very basic test to try and query all of the records that had the same thing in common; _language_code.

Code Example:

const {
    KendraClient,
    QueryCommand
} = require("@aws-sdk/client-kendra");
const {
    fromIni
} = require("@aws-sdk/credential-provider-ini");
const client = new KendraClient({
    credentials: fromIni({
        profile: 'ccs-account'
    })
});
const fs = require('fs');

const index = "e65cacb1-5492-4760-84aa-7c6faa407455";
const pageSize = 100;

let currentPage = 1;
let totalResults;
let results = [];

/**
 * Init
 */
const go = async () => {

    let params = getParams(currentPage); // 1 works fine, 100 results returned. 2 returns 0 results
    const command = new QueryCommand(params);
    const response = await client.send(command);

    totalResults = response.TotalNumberOfResults;
    results = response.ResultItems;
    
    // Write results to json
    fs.writeFile('data.json', JSON.stringify(results, null, 4), (err) => {
        if (err) throw err;
    });

}

/**
 * Get params for query
 * @param {*} page 
 * @returns 
 */
function getParams(page) {

    return  {
        IndexId: index,
        PageSize: pageSize,
        PageNumber: page,
        AttributeFilter: {
            "EqualsTo": {
                "Key": "_language_code",
                "Value": {
                    "StringValue": "en"
                }
            }
        },
        SortingConfiguration: {
            "DocumentAttributeKey": "_document_title",
            "SortOrder": "ASC"
        }
    };
}

// Run
go();

The Problem / Question:

From what I can see in the documentation, the params seem to accept a PageNumber and PageSize which is an indication of paginated results.

When I query PageNumber=1 and PageSize=100, I get 100 records successfully as expected. Since the pagesize limit seems to be 100 results, my assumption would now be that I can change the PageNumber=2 and get the next 100 results. Repeating this process until I have retrieved the total records so I can QA the data.

I am at a loss as to why 0 records are returned when I target the second page as there should certainly be 3 pages of 100 results and 1 page of 84 results.

Any thoughts on what I am missing here? Is there a simpler way to export the indexed data to perform such analysis?

Thanks!

Upvotes: 2

Views: 1036

Answers (1)

Kan Wang
Kan Wang

Reputation: 1

Please refer to the API documentation: https://docs.aws.amazon.com/kendra/latest/dg/API_Query.html

Each query returns the 100 most relevant results.

So you can't go to more than top 100 result by requesting second page. If you need to request more result, please request limit increase: https://docs.aws.amazon.com/kendra/latest/dg/quotas.html

Maximum number of search results per query. Default is 100. To enable more than 100 results, see Quotas Support

Upvotes: 0

Related Questions