Reputation: 436
I have a custom Extractor with AtomicFileProcessing set to false. It extracts a large no of JSON files (each line in the file is a JSON document) and output two files with successful and failed requests, both of them contains the json rows (AUs allocated more than 1 to extract the files). Problem is when I use the same extractor to extract the outputted files in first step with more than one AU, it fails with the error, Unexpected character encountered while parsing value: e. Path '', line 0, position 0.
If I assign 1 AU on Azure or run this locally with AU set to more than 1, it successfully processes the data. Is this behavior because of more AU provided to process a single JSON file and since the file is in non-splittable format, it can't be parallelized?
Upvotes: 0
Views: 39
Reputation: 1138
you can solve this problem converting your json file to Jsonlines.
http://jsonlines.org/examples/
Then you need to read the file using text extractor and use JsonFunctions available on Microsoft.Analytics.Samples.Formats
to read the json.
That transformation will make your file splittable and you can parallelized it!
Upvotes: 0