Reputation: 145
I am trying to get the data of Bogota from OpenAQ in AWS Athena, using the following query.
SELECT *
FROM openaq
WHERE city='Bogota'
I get following referring to a malformed JSON.
Row is not a valid JSON Object - JSONException: Unterminated string at 201 [character 202 line 1]
Is there a way to ignore the rows with malformed JSONs in the query?
Upvotes: 0
Views: 870
Reputation: 21
I have tried adjusting the table template from this source using this line of code as per this thread.
'ignore.malformed.json'='true'
So the new table template query reads:
CREATE EXTERNAL TABLE `openaq`(
`date` struct<utc:string,local:string> COMMENT 'from deserializer',
`parameter` string COMMENT 'from deserializer',
`location` string COMMENT 'from deserializer',
`value` float COMMENT 'from deserializer',
`unit` string COMMENT 'from deserializer',
`city` string COMMENT 'from deserializer',
`attribution` array<struct<name:string,url:string>> COMMENT 'from deserializer',
`averagingperiod` struct<unit:string,value:float> COMMENT 'from deserializer',
`coordinates` struct<latitude:float,longitude:float> COMMENT 'from deserializer',
`country` string COMMENT 'from deserializer',
`sourcename` string COMMENT 'from deserializer',
`sourcetype` string COMMENT 'from deserializer',
`mobile` string COMMENT 'from deserializer')
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'ignore.malformed.json'='true')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://openaq-fetches/realtime-gzipped'
TBLPROPERTIES (
'transient_lastDdlTime'='1518373755')
This seems to solve the error for me.
Upvotes: 2