user8142520
user8142520

Reputation: 781

Getting HIVE_CURSOR_ERROR while querying the table in Athena

Looks like it's a known issue, but my scenario is a little bit different. The data source of my Athena table is a s3 bucket, and I have crawler job reads against it to build the table. Serde serialization lib is set as org.openx.data.jsonserde.JsonSerDe automatically.

Besides fixing it, I'm also wonder how I could find which entry or s3 file caused this problem. Because I can't manually check s3 file by file, it's too huge, and I can't query Athena. Thanks.

Error message:



HIVE_CURSOR_ERROR: Row is not a valid JSON Object - JSONException: Illegal escape. at 99 [character 100 line 1]

Upvotes: 0

Views: 2767

Answers (1)

Philipp Johannis
Philipp Johannis

Reputation: 2956

You could ignore malformed json, this would ensure that you can query the table, as described in the Best Practices for Reading JSON Data by AWS:

...
 ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
 WITH SERDEPROPERTIES ('ignore.malformed.json' = 'true')
 LOCATION 's3://bucket/path/';

I would start with this to ensure that at least some data can be read.

Athena also offers to show the underlying file of a record by using "$path":

SELECT "$path" FROM "my_database"."my_table"

This might open the possbilities to check how many rows every file has and then check which file can't be fully read by Athena.

Upvotes: 1

Related Questions