Reputation: 351
How to load data from two separate collections of azure cosmos db to a single azure search index? I need a solution to join the data from two collections in a way similar to inner joining concept of SQL and load that data to azure search service.
I have two collections in azure cosmos db. One for product and sample documents for the same is as below.
{
"description": null,
"links": [],
"replaces": "00000000-0000-0000-0000-000000000000",
"replacedBy": "00000000-0000-0000-0000-000000000000",
"productTypeId": "ccd0bc73-c4a1-41bf-9c96-454a5ba1d025",
"id": "a4853bf5-9c58-4fb5-a1ff-fc3ab575b4c8",
"name": "New Product",
"createDate": "2018-09-19T10:04:35.1951552Z",
"createdBy": "00000000-0000-0000-0000-000000000000",
"updateDate": "2018-10-05T13:46:24.7048358Z",
"updatedBy": "DIJdyXMudaqeAdsw1SiNyJKRIi7Ktio5@clients"
}
{
"description": null,
"links": [],
"replaces": "00000000-0000-0000-0000-000000000000",
"replacedBy": "00000000-0000-0000-0000-000000000000",
"productTypeId": "ccd0bc73-c4a1-41bf-9c96-454a5ba1d025",
"id": "b9b6c3bc-a8f8-470f-ac93-be589eb1da16",
"name": "New Product 2",
"createDate": "2018-09-19T11:02:02.6919008Z",
"createdBy": "00000000-0000-0000-0000-000000000000",
"updateDate": "2018-09-19T11:02:02.6919008Z",
"updatedBy": "00000000-0000-0000-0000-000000000000"
}
{
"description": null,
"links": [],
"replaces": "00000000-0000-0000-0000-000000000000",
"replacedBy": "00000000-0000-0000-0000-000000000000",
"productTypeId": "ccd0bc73-c4a1-41bf-9c96-454a5ba1d025",
"id": "98b3647a-3b40-4a00-bd0f-2a397bd48b68",
"name": "New Product 7",
"createDate": "2018-09-20T09:42:28.2913567Z",
"createdBy": "00000000-0000-0000-0000-000000000000",
"updateDate": "2018-09-20T09:42:28.2913567Z",
"updatedBy": "00000000-0000-0000-0000-000000000000"
}
Another collection for ProductType with below sample document.
{
"description": null,
"links": null,
"replaces": "00000000-0000-0000-0000-000000000000",
"replacedBy": "00000000-0000-0000-0000-000000000000",
"id": "ccd0bc73-c4a1-41bf-9c96-454a5ba1d025",
"name": "ProductType1_186",
"createDate": "2018-09-18T23:54:43.9395245Z",
"createdBy": "00000000-0000-0000-0000-000000000000",
"updateDate": "2018-10-05T13:29:44.019851Z",
"updatedBy": "DIJdyXMudaqeAdsw1SiNyJKRIi7Ktio5@clients"
}
The product type id is referred in product collection and that is the column which links both the collections.
I want to load the above two collections to the same azure search service index and I expect my field of index to be populated somewhat like below.
Upvotes: 1
Views: 689
Reputation: 405
If you use product id as the key, you can simply point two indexers at the same index, and Azure Search will merge the documents automatically. For example, here are two indexer definitions that would merge their data into the same index:
{
"name" : "productIndexer",
"dataSourceName" : "productDataSource",
"targetIndexName" : "combinedIndex",
"schedule" : { "interval" : "PT2H" }
}
{
"name" : "sampleIndexer",
"dataSourceName" : "sampleDataSource",
"targetIndexName" : "combinedIndex",
"schedule" : { "interval" : "PT2H" }
}
Learn more about the create indexer api here
However, it appears that the two collections share the same fields. This means that the fields from the document which gets indexed last will replace the fields from the document that got indexed first. To avoid this, I would recommend replacing the fields that match the 00000000-0000-0000-0000-000000000000
pattern with null
in your Cosmos DB query. For example:
SELECT productTypeId, (createdBy != "00000000-0000-0000-0000-000000000000" ? createdBy : null) as createdBy FROM products
This exact query may not work for your use case. See the query syntax reference for more information.
Please let me know if you have any questions, or something is not working as expected.
Thanks Matt
Upvotes: 2