Reputation: 8970
I am testing out AWS Kendra for a business use-case and I am having trouble figuring out how to query data in the index to ensure data accuracy.
The connection where the data is coming from uses our Salesforce instance which contains over 1,000 knowledge articles.
The syncing of data appears to be working and we can see that the document count is 384
.
Now, because we have over 1,000 possible articles, we have restricted our API user that is connecting Kendra to Salesforce to only be able to access specific articles.
Before we move forward, we want to ensure that the articles indexed are what we expect and have allowed the API user to bring over.
What I am now trying to do is audit / export the records that are in the index so I can compare them to the records we expect to see from the source.
For this, I am using the javascript SDK @aws-sdk/client-kendra
.
I wrote a very basic test to try and query all of the records that had the same thing in common; _language_code
.
Code Example:
const {
KendraClient,
QueryCommand
} = require("@aws-sdk/client-kendra");
const {
fromIni
} = require("@aws-sdk/credential-provider-ini");
const client = new KendraClient({
credentials: fromIni({
profile: 'ccs-account'
})
});
const fs = require('fs');
const index = "e65cacb1-5492-4760-84aa-7c6faa407455";
const pageSize = 100;
let currentPage = 1;
let totalResults;
let results = [];
/**
* Init
*/
const go = async () => {
let params = getParams(currentPage); // 1 works fine, 100 results returned. 2 returns 0 results
const command = new QueryCommand(params);
const response = await client.send(command);
totalResults = response.TotalNumberOfResults;
results = response.ResultItems;
// Write results to json
fs.writeFile('data.json', JSON.stringify(results, null, 4), (err) => {
if (err) throw err;
});
}
/**
* Get params for query
* @param {*} page
* @returns
*/
function getParams(page) {
return {
IndexId: index,
PageSize: pageSize,
PageNumber: page,
AttributeFilter: {
"EqualsTo": {
"Key": "_language_code",
"Value": {
"StringValue": "en"
}
}
},
SortingConfiguration: {
"DocumentAttributeKey": "_document_title",
"SortOrder": "ASC"
}
};
}
// Run
go();
The Problem / Question:
From what I can see in the documentation, the params seem to accept a PageNumber
and PageSize
which is an indication of paginated results.
When I query PageNumber=1
and PageSize=100
, I get 100 records successfully as expected. Since the pagesize limit seems to be 100 results, my assumption would now be that I can change the PageNumber=2
and get the next 100 results. Repeating this process until I have retrieved the total records so I can QA the data.
I am at a loss as to why 0 records are returned when I target the second page as there should certainly be 3 pages of 100 results and 1 page of 84 results.
Any thoughts on what I am missing here? Is there a simpler way to export the indexed data to perform such analysis?
Thanks!
Upvotes: 2
Views: 1036
Reputation: 1
Please refer to the API documentation: https://docs.aws.amazon.com/kendra/latest/dg/API_Query.html
Each query returns the 100 most relevant results.
So you can't go to more than top 100 result by requesting second page. If you need to request more result, please request limit increase: https://docs.aws.amazon.com/kendra/latest/dg/quotas.html
Maximum number of search results per query. Default is 100. To enable more than 100 results, see Quotas Support
Upvotes: 0