Reputation: 31
I am trying to build a classifier using GCP AutoML. I've successfully created the dataset, but when training I get the following error:
Training pipeline failed with error message: Less than 50% of rows
successfully generated examples.
This is an imbalanced classification problem, so I am optimising for AUC PRC. Also, the data split is done using a date column.
Any ideas why I am getting this error and how to solve it?
Upvotes: 1
Views: 214
Reputation: 298
Hey I realise this is a very old question, but for me I think the problem was caused by the timestamp column (used for splitting the data) having more than one format (e.g. some looked like 2022-04-16 07:32:25.810000 UTC
and some like 2022-04-29 22:20:05 UTC
) in my source BigQuery table.
Truncating the timestamps to be consistent (like the below in BigQuery) fixed the issue.
SELECT
...
TIMESTAMP_TRUNC(timestamp, MINUTE) as timestamp
...
FROM ...
Upvotes: 0