Reputation: 43
currently I am trying to read and unpivot csv-files with unknown column names on Microsoft Azure. Therefor I am using a Data Factory with two data lake analytics activities: The first activity generates a script to read and unpivot the data and the second avtivity is just the execution of this script. My problem is, that sometimes the generated scripts from the first activity are too big
"The provided U-SQL script is 6449969 bytes long, which exceeds the size limit of 2097152 bytes."
My idea was to split them, but I think that it is not possible to run more than 1 script in 1 activity. Since I dont know in how many party the script would be devided I cannot just add a fix number of activities.
Any suggestions?
Upvotes: 0
Views: 387
Reputation: 312
The only way to work around this limitation at this point is to write a custom extractor. However, you will have to expose the data not as a string but as byte[].
If you use a custom extractor that just reads the byte array, you can go up to 4MB.
In general, if you need to parse your row, it is going to be probably faster, if you write your custom extractor instead of using the built-in extractor and then write another U-SQL transformation or two to parse the data (again).
You can refer to this repo maybe for some insights - https://github.com/Azure/usql/tree/mrys-json
Upvotes: 1