daniel baird
daniel baird

Reputation: 3

ADF Dataflow issue with source API query

I am trying to setup an Azure Data Factory Dataflow.

The source is a REST API from Monday.com which uses GraphQL. The API requires POST method, with 3 headers Content-Type "application/json", API-Version "2025-01" and Authorization: "API token"

The request body requires a query to pull the data. Here's my query example

{
"query":"query { boards(ids: **someid**) {items_page(limit: 1) {cursor items {id name column_values {column {title} text}}}}}"
}

When I run the dataflow it fails with the following:

Error:
Spark job failed: {
"text/plain": "{\"runId\":\"7d4ed91d-1716-4d54-84a03c27cde7890a\",\"sessionId\":\"0e29d687-0201-4f4d-b18d-760767a23168\",\"status\":\"Failed\",\"payload\":{\"statusCode\":400,\"shortMessage\":\"com.microsoft.dataflow.broker.InvalidOperationException: DSL compilation failed: DF-DSL-001 - DSL stream has parsing errors\\nLine 8 Position 11: body: '{ \\nmismatched input ''' expecting {DECIMAL_LITERAL, HEX_LITERAL, OCT_LITERAL, BINARY_LITERAL, MAX_INT, MIN_INT, MAX_LONG, MIN_LONG, POSITIVE_INF, NEGATIVE_INF, '-', '!', '$', '~', ':', '(', '#', '[', '@(', '[]', FLOAT_LITERAL, HEX_FLOAT_LITERAL, STRING_LITERAL, REGEX_LITERAL, 'parameters', 'functions', 'stores', 'as', 'input', 'output', 'constant', 'expression', 'integer', 'short', 'long', 'double', 'float', 'decimal', 'boolean', 'timestamp', 'date', 'byte', 'binary', 'integral', 'number', 'fractional', 'any', IDENTIFIER, ANY_IDENTIFIER, META_MATCH, '$$', '$$$', '$#', OPEN_INTERPOLATE}\",\"detailedMessage\":\"Failure 2025-02-22 00:25:08.846 failed DebugManager.processJob, run=7d4ed91d-1716-4d54-84a0-3c27cde7890a, errorMessage=com.microsoft.dataflow.broker.InvalidOperationException: DSL compilation failed: DF-DSL-001 - DSL stream has parsing errors\\nLine 8 Position 11: body: '{ \\nmismatched input ''' expecting {DECIMAL_LITERAL, HEX_LITERAL, OCT_LITERAL, BINARY_LITERAL, MAX_INT, MIN_INT, MAX_LONG, MIN_LONG, POSITIVE_INF, NEGATIVE_INF, '-', '!', '$', '~', ':', '(', '#', '[', '@(', '[]', FLOAT_LITERAL, HEX_FLOAT_LITERAL, STRING_LITERAL, REGEX_LITERAL, 'parameters', 'functions', 'stores', 'as', 'input', 'output', 'constant', 'expression', 'integer', 'short', 'long', 'double', 'float', 'decimal', 'boolean', 'timestamp', 'date', 'byte', 'binary', 'integral', 'number', 'fractional', 'any', IDENTIFIER, ANY_IDENTIFIER, META_MATCH, '$$', '$$$', '$#', OPEN_INTERPOLATE}\"}}\n"
} - RunId: 7d4ed91d-1716-4d54-84a0-3c27cde7890a`

I have tried placing the query into a single line but it is being rejected by the API source with the following error:

`at Source` `'source2':` `Failure to read most recent page request:` `DF-REST_001`- `Error response from server: Some(<html>`
`<head><title>400 Bad Request</title></head>`
`<body>`
`<center><h1>400 Bad Request</h1></center>`
`<hr><center>cloudflare</center>`
`</body>`
`</html>), Status code: 400. Please check your request url and body.` `(url:https://api.monday.com/v2/,request body: Some({ "query":"query { boards(ids:` `some id) {items_page(limit: 1) { cursor items { id name column_values { column { title `}` text } } } }}" }), request method: POST`

The dataflow JSON is automatically created by ADF which shows this and as you can see it is placing , after the first { and last }

"scriptLines": [
                "source(allowSchemaDrift: true,",
                "     validateSchema: false,",
                "     inferDriftedColumnTypes: true,",
                "     format: 'rest',",
                "     timeout: 30,",
                "     headers: ['Content-Type' -> 'application/json', 'API-Version' -> '2025-01', 'Authorization' -> 'Some API Token', 'Accept' -> 'application/json'],",
                "     httpMethod: 'POST',",
                "     body: '{ ",
                "\"query\":\"query { boards(ids: some id) {items_page(limit: 1) { cursor items { id name column_values { column { title } text } } } }}\" ",
                "}',",
                "     paginationRules: ['supportRFC5988' -> 'true'],",
                "     responseFormat: ['type' -> 'json', 'documentForm' -> 'arrayOfDocuments']) ~> source2">

Upvotes: 0

Views: 80

Answers (1)

daniel baird
daniel baird

Reputation: 3

thanks for your response. I built the Graph query in Monday.com API playground. The query works in the body of both web and copy data activities. Unfortunately, the schema column values are repeated from the source, and I need map the column value {text} to their respective columns in Azure SQL table. I have the process working in a pipeline fine by using an initial web activity to then store the cursor into a variable which then an Until activity runs a copy activity with another web activity/set variable to continue to update the cursor until the if condition is met which is cursor results to null.

The problem I have with this method is it takes approximately 15 seconds to run each process and to write the data using the limit to 1 result per cursor (this enables the rows to be wrote separately for the repeated column values). I have a large amount of data to query and write and would take days.

My idea was to use the dataflow so i could call the source API and then flatten the JSON to be able to then re-map the schema before its wrote to the Azure SQL table.

Thanks

Upvotes: 0

Related Questions