Form Recognizer V2 / Costs are exploding

Question

In Response to ChadZ answer here is the metric of the form recognizer which i'm talking about Form Recognizer Metrics. In our test we're checking a directory for files and analyzing them in a sequential manner, waiting for each response, writing the results, getting the next file and so on. No Multithreading.

Have a look at the biggest spike at April, 14 with are 15330 Calls. If we assume that each call at April, 14 took 10 seconds (which would be fast, normaly it could take up to a minute) those analyzing took 153300 seconds, which are 2555 minutes or 42,58 hours. Even if analyzing would only take 5 seconds that would be more than 20 hours.

Ofcourse i could be wrong but currently the best logical explanation would be that also get-requests are tracked & billed.

Original Post

I'm using a custom model with labels (created with the sample labeling tool) and getting the results with the "Python Form Recognizer Async Analyze" V2 SDK Code from the bottom of this this page. While the async thing in V2 is much slower than V1 (which i described here) it also seems much, much more expensive.

The original example code to get the result after a post api call looks like this:

n_tries = 15
n_try = 0
wait_sec = 5
max_wait_sec = 60
while n_try < n_tries:
    try:
        resp = get(url = get_url, headers = {"Ocp-Apim-Subscription-Key": apim_key})
        resp_json = resp.json()
        if resp.status_code != 200:
            print("GET analyze results failed:
%s" % json.dumps(resp_json))
            quit()
        status = resp_json["status"]
        if status == "succeeded":
            print("Analysis succeeded:
%s" % json.dumps(resp_json))
            quit()
        if status == "failed":
            print("Analysis failed:
%s" % json.dumps(resp_json))
            quit()
        # Analysis still running. Wait and retry.
        time.sleep(wait_sec)
        n_try += 1
        wait_sec = min(2*wait_sec, max_wait_sec)     
    except Exception as e:
        msg = "GET analyze results failed:
%s" % str(e)
        print(msg)
        quit()
print("Analyze operation did not complete within the allocated time.")

As you can see in the original example code it looks every 5 seconds to get the result.

My Problem: It seems to me that not only the api call for analyzing a document is billed but also each and every get-request to get the results.

Our bill has tenfold and more since using V2. We currently in testing phase and we've usually about 400-500 Documents per month which were correctly tracked and billed in V1. With V2 and the sample code above we now have 63690 (!!!!!) Calls, each call ist billed, costs are exploding.

Can anybody confirm this behaviour?

Personaly i'd like to get back the sync-operation where the response of the api call also contains the result of the any document analyse.

    try:
        url = base_url + "/models/" + model_id + "/analyze"
        with open(filepath, "rb") as f:
            data_bytes = f.read()
        response = requests.post(url=url, data=data_bytes, headers=headers)
        return response.json()
    except Exception as e:
        print(str(e))
        return None

unfortunately this doesn't work anymore.....

    try:
        response = requests.post(url=post_url, data=data_bytes, headers=headers)  # , params=params)
        if response.status_code != 202:
            return None
        # Success
        get_url = response.headers["operation-location"]
        return form_recognizerv2_getdata(get_url, subscription_key)
    except Exception as e:
        print("POST analyze failed:
%s" % str(e))
        return None

ChaZ · Accepted Answer

I can confirm that in Form Recognizer v2, GET calls are not billed. And train call is free too. If there's a billing issue, please contact customer service.

Form Recognizer V2 / Costs are exploding

Original Post

Answers (2)

Related Questions