Timothée HENRY
Timothée HENRY

Reputation: 14614

AWS Comprehend custom classification job output has more rows than input

I have used AWS Comprehend to train an NLP model. The prediction on the test set runs successfully, but the output file has more rows than the input:

input: 1000 rows

output: 2082 rows

Output looks like this:

predictions.json <...>
{"File": "test.csv", "Line": "0", "Classes": [{"Name": "No", "Score": 0.7022}, {"Name": "Yes", "Score": 0.2892}, {"Name": "tag", "Score": 0.0086}]}
{"File": "test.csv", "Line": "1", "Classes": [{"Name": "No", "Score": 0.6252}, {"Name": "Yes", "Score": 0.3747}, {"Name": "tag", "Score": 0.0001}]}
{"File": "test.csv", "Line": "2", "Classes": [{"Name": "No", "Score": 0.9295}, {"Name": "Yes", "Score": 0.0705}, {"Name": "tag", "Score": 0.0}]}
{"File": "test.csv", "Line": "3", "Classes": [{"Name": "No", "Score": 0.5247}, {"Name": "Yes", "Score": 0.4753}, {"Name": "tag", "Score": 0.0}]}
...
{"File": "test.csv", "Line": "2080", "Classes": [{"Name": "No", "Score": 0.8528}, {"Name": "Yes", "Score": 0.1471}, {"Name": "tag", "Score": 0.0001}]}
{"File": "test.csv", "Line": "2081", "Classes": [{"Name": "No", "Score": 0.5318}, {"Name": "Yes", "Score": 0.4682}, {"Name": "tag", "Score": 0.0}]}

Can anyone help me on how to use the output?

Upvotes: 0

Views: 613

Answers (3)

Maria Frances Gaska
Maria Frances Gaska

Reputation: 1

In my case, besides UTF-8 it was also the presence of carriage return \r in the text.

Upvotes: 0

Karan Gaur
Karan Gaur

Reputation: 859

I faced the same issue. In my case the error was because the prediction file (Test.csv in your case) was not in the specified encoding. AWS Comprehend requires - "UTF-8" Encoding.
AWS Docs Link

Upvotes: 2

Alvaro Romero Diaz
Alvaro Romero Diaz

Reputation: 320

One option is to split each sentence in a different file and use the whole folder as test set, fixing the option:

 "InputFormat": "ONE_DOC_PER_FILE"

Other options is try to find how many '/n' are there in the dataset, the error could be this one.

Upvotes: 0

Related Questions