Reputation: 83
I have a JSON file that is the output of the Google Cloud Speech to Text API, and almost identical to the JSON files that were the output of the IBM Watson Speech to Text API. I'm comparing mean reported Confidence and Cosine Similarity across all of our test files to measure baseline performance of the two services.
Here's the code I used for IBM Watson:
# read python dict back from the file
pkl_file = open('./00_data/Watson Responses/watson_3.pkl', 'rb')
data_response = pickle.load(pkl_file)
pkl_file.close()
# use pretty print to format the data in JSON format
pprint(data_response)
# calculate mean confidence for all
m = mean(
a["confidence"] for r in data_response["results"] for a in r["alternatives"]
)
print("Mean confidence:", m)
And here's a sample of the JSON file from Watson:
{'result_index': 0,
'results': [{'alternatives': [{'confidence': 0.91,
'transcript': 'hello hi can i please speak to '
'[redacted]'}],
'final': True},
{'alternatives': [{'confidence': 0.89,
'transcript': 'yeah this is this is [redacted] hi '
"this is [redacted] i'm calling on "
'behalf of [redacted] '
'on a recorded line on '
"trust you're doing well "}],
However, the nested dictionary from Google (below) was slightly different, it's missing the 'result_index' key. It doesn't seem to affect the first and second levels of the dictionary, but there's a consistent KeyError:
when accessing the third level. I can use data_response['results'][0]['alternatives'][0]['transcript']
to access a single response: 'Hello. Hi. Can I please speak to [redacted]?'
{'results': [{'alternatives': [{'confidence': 0.91283852,
'transcript': 'Hello. Hi. Can I please speak '
'to Brian?',
'words': [{'confidence': 0.91283858,
'endTime': '1.800s',
'startTime': '1.400s',
'word': 'Hello.'},
When I use generator expression like above:
# read python dict back from the file
json_file = open('./00_data/Google Cloud Responses/gcp_2.json', 'rb')
data_response = json.load(json_file)
json_file.close()
# use pretty print to format the data in JSON format
pprint(data_response)
# calculate mean confidence for all
m = mean(
a["confidence"] for r in data_response["results"] for a in r["alternatives"]
)
print("Mean confidence:", m)
I keep getting KeyError: 'confidence'
What is the difference between these two dictionaries that I am missing?
Upvotes: 1
Views: 267