Reputation: 781
Looks like it's a known issue, but my scenario is a little bit different. The data source of my Athena table is a s3 bucket, and I have crawler job reads against it to build the table. Serde serialization lib is set as org.openx.data.jsonserde.JsonSerDe
automatically.
Besides fixing it, I'm also wonder how I could find which entry or s3 file caused this problem. Because I can't manually check s3 file by file, it's too huge, and I can't query Athena. Thanks.
Error message:
HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: Illegal escape. at 99 [character 100 line 1]
Upvotes: 0
Views: 2767
Reputation: 2956
You could ignore malformed json, this would ensure that you can query the table, as described in the Best Practices for Reading JSON Data by AWS:
...
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('ignore.malformed.json' = 'true')
LOCATION 's3://bucket/path/';
I would start with this to ensure that at least some data can be read.
Athena also offers to show the underlying file of a record by using "$path"
:
SELECT "$path" FROM "my_database"."my_table"
This might open the possbilities to check how many rows every file has and then check which file can't be fully read by Athena.
Upvotes: 1