Rory McCrossan
Rory McCrossan

Reputation: 337560

Duplicates returned in query on child object

I have a data structure in my document like this (note this is simplified for brevity):

{
    "id": "c1c1c1c1-c1c1-c1c1-c1c1-c1c1c1c1c1c1",
    "name": "Bruce Banner",
    "accountId": "a1a1a1a1-a1a1-a1a1-a1a1-a1a1a1a1a1a1",
    "contributors": [{
        "accountId": "a2a2a2a2-a2a2-a2a2-a2a2-a2a2a2a2a2a2",
        "type": "Foo"
    },{
        "accountId": "a3a3a3a3-a3a3-a3a3-a3a3-a3a3a3a3a3a3",
        "type": "Bar"
    }]
},
{
    "id": "c2c2c2c2-c2c2-c2c2-c2c2-c2c2c2c2c2c2",
    "name": "Tony Stark",
    "accountId": "a2a2a2a2-a2a2-a2a2-a2a2-a2a2a2a2a2a2",
    "contributors": [{
        "accountId": "a1a1a1a1-a1a1-a1a1-a1a1-a1a1a1a1a1a1",
        "type": "Fizz"
    }]
},

I am attempting to write a query which retrieves documents where the provided accountId is either on the parent record or within the contributors array:

SELECT e.id, e.accountId, e.name
FROM Entitity e
JOIN co IN e.contributors
WHERE e.accountId = 'a1a1a1a1-a1a1-a1a1-a1a1-a1a1a1a1a1a1'
OR co.accountId = 'a1a1a1a1-a1a1-a1a1-a1a1-a1a1a1a1a1a1'

Result:

[{
    "id": "c1c1c1c1-c1c1-c1c1-c1c1-c1c1c1c1c1c1",
    "accountId": "a1a1a1a1-a1a1-a1a1-a1a1-a1a1a1a1a1a1",
    "name": "Bruce Banner"
},{
    "id": "c1c1c1c1-c1c1-c1c1-c1c1-c1c1c1c1c1c1",
    "accountId": "a1a1a1a1-a1a1-a1a1-a1a1-a1a1a1a1a1a1",
    "name": "Bruce Banner"
},{
    "id": "c2c2c2c2-c2c2-c2c2-c2c2-c2c2c2c2c2c2",
    "accountId": "a2a2a2a2-a2a2-a2a2-a2a2-a2a2a2a2a2a2",
    "name": "Tony Stark"
}]

As you can see, the first entity (Bruce Banner) is duplicated. If I remove the JOIN clause it works correctly. Can anyone tell me why this is, and how I can avoid the duplication?

Edit - For clarity, this is my expected response:

[{
    "id": "c1c1c1c1-c1c1-c1c1-c1c1-c1c1c1c1c1c1",
    "accountId": "a1a1a1a1-a1a1-a1a1-a1a1-a1a1a1a1a1a1",
    "name": "Bruce Banner"
},{
    "id": "c2c2c2c2-c2c2-c2c2-c2c2-c2c2c2c2c2c2",
    "accountId": "a2a2a2a2-a2a2-a2a2-a2a2-a2a2a2a2a2a2",
    "name": "Tony Stark"
}]

Upvotes: 1

Views: 621

Answers (1)

Andrew Liu
Andrew Liu

Reputation: 8119

I see you are trying to query on whether the accountId or the contributors's accountId is equal to some value.

Today - you need to use the JOIN operator to perform a cross-product in order to query across all elements within a JSON array (note: you do not need a JOIN to query on a specific array index, e.g. WHERE e.contributors[0].accountId = 'a1a1a1a1-a1a1-a1a1-a1a1-a1a1a1a1a1a1').

In your example abpve, you will get duplicates from the query which is unavoidable. You will want to implement some application logic to filter duplicates out of the query result.

To get a better picture in to the behavior of JOIN (think simple cross product), try adding a field from the array you are producing a cross product with (e.g. co.type):

SELECT e.id, e.accountId, e.name, co.type
FROM Entitity e
JOIN co IN e.contributors
WHERE e.accountId = 'a1a1a1a1-a1a1-a1a1-a1a1-a1a1a1a1a1a1'
OR co.accountId = 'a1a1a1a1-a1a1-a1a1-a1a1-a1a1a1a1a1a1'

which results in:

[{
    id: c1c1c1c1 - c1c1 - c1c1 - c1c1 - c1c1c1c1c1c1,
    accountId: a1a1a1a1 - a1a1 - a1a1 - a1a1 - a1a1a1a1a1a1,
    name: Bruce Banner,
    type: Foo
}, {
    id: c1c1c1c1 - c1c1 - c1c1 - c1c1 - c1c1c1c1c1c1,
    accountId: a1a1a1a1 - a1a1 - a1a1 - a1a1 - a1a1a1a1a1a1,
    name: Bruce Banner,
    type: Bar
}, {
    id: c2c2c2c2 - c2c2 - c2c2 - c2c2 - c2c2c2c2c2c2,
    accountId: a2a2a2a2 - a2a2 - a2a2 - a2a2 - a2a2a2a2a2a2,
    name: Tony Stark,
    type: Fizz
}]

As you can see from the results - a record is being returned for each of the children: Foo, Bar, and Fizz. That is because each of those array elements match the specified query.

Upvotes: 1

Related Questions