dparkar
dparkar

Reputation: 2132

Azure Data Factory : copying data from CosmosDB to SQL Database fails with "range-indexed" error

I need to move data from Azure CosmosDB (previously known as DocumentDB) into Azure SQL database.

I am using the "Copy Data" resource where I have setup the source and the destination as well as the mapping.

The schedule has been setup for once every hour. The issues :

  1. I am noticing that the no. of instances or "runs" are more than once per hour. Is this because it is failing and it keeps retrying for a certain amount of times ?
  2. The main error why its failing seems to be this :

Activity Copy_dbo_SubscriptionLocator failed: Failure happened on 'Source' side. ErrorCode=UserErrorDocumentDBReadError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=DocumentDb operation failed: Message: {"Errors":["An invalid query has been specified with filters against path(s) that are not range-indexed. Consider adding allow scan header in the request."]}

What changes do I need to make on CosmosDB end to avoid the error ?

Following is my current indexing policy :

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/*",
            "indexes": [
                {
                    "kind": "Range",
                    "dataType": "Number",
                    "precision": -1
                },
                {
                    "kind": "Hash",
                    "dataType": "String",
                    "precision": 3
                }
            ]
        }
    ],
    "excludedPaths": [
        {
            "path": "/OneOfTheColumns/*"
        }
    ]
}

Looks like the issue is with the datetime field which needs to be range indexed as a string. I could possibly use the _ts field in the document. But how do I change the Copy job to convert the query's datetime to epoch time and use its value for _ts field.

Upvotes: 0

Views: 1464

Answers (2)

Jay Gong
Jay Gong

Reputation: 23782

I'm not sure about which language you are using, so I reproduced your issue on my side, please refer to my steps.

My sample documents:

[
    {
        "id": "1",
        "name": "Jay",
        "course": "A",
        "score": 50
    },
    {
        "id": "2",
        "name": "Peter",
        "course": "B",
        "score": 20
    }
]

Index policy:

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/name/?",
            "indexes": [
                {
                    "kind": "Hash",
                    "dataType": "String",
                    "precision": 3
                },
                {
                    "kind": "Range",
                    "dataType": "Number",
                    "precision": -1
                }
            ]
        }
    ],
    "excludedPaths": [
        {
            "path": "/*"
        }
    ]
}

Query in the source:(SELECT c.id,c.name,c.course,c.score FROM c where c.score>30)

enter image description here

Reproduce your issue:

enter image description here

Range supports efficient equality queries, range queries (using >, <,

=, <=, !=), and ORDER BY queries. ORDER By queries by default also require maximum index precision (-1). The data type can be String or Number.

When your have range query, you need define range index on the field. Please refer to official doc.

So, I modified the above index policy and copied data successfully:

{
    "indexingMode": "consistent",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/*",
            "indexes": [
                {
                    "kind": "Range",
                    "dataType": "Number",
                    "precision": -1
                },
                {
                    "kind": "Range",
                    "dataType": "String",
                    "precision": -1
                },
                {
                    "kind": "Spatial",
                    "dataType": "Point"
                }
            ]
        }
    ],
    "excludedPaths": []
}

enter image description here

enter image description here

Hope it helps you.

Upvotes: 3

Eva Xiao
Eva Xiao

Reputation: 47

You need to update your index policy of the document collection to allow this query.

Say you have a query like "select * from c where c.property>"something", this c.property must have a range index.

You can learn more about the indexing policy here: and here

Upvotes: 2

Related Questions