markthegrea
markthegrea

Reputation: 3841

Athena returns "Zero records returned"

I have loaded my data into S3 (Json format). I have executed "MSCK REPAIR TABLE dqme_violations_v1" I have created my table and it seems to be pointing to the correct place. What gives?

  1. Do the files need to be suffixed with .json?
  2. Any other suggestions? I'm at a loss.

My table (via: show create table dqme_violations_v1):

CREATE EXTERNAL TABLE `dqme_violations_v1`(
  `asset_id` string COMMENT 'from deserializer', 
  `make` string COMMENT 'from deserializer', 
  `serial_number` string COMMENT 'from deserializer', 
  `message_format` string COMMENT 'from deserializer', 
  `content_type` string COMMENT 'from deserializer', 
  `content` string COMMENT 'from deserializer', 
  `rule_id` string COMMENT 'from deserializer', 
  `ruleset` string COMMENT 'from deserializer', 
  `rule_type` string COMMENT 'from deserializer', 
  `error_type` string COMMENT 'from deserializer', 
  `error_name` string COMMENT 'from deserializer', 
  `error_criteria` string COMMENT 'from deserializer')
COMMENT 'The DQME OG Grief violation table'
PARTITIONED BY ( 
  `dt` string)
ROW FORMAT SERDE 
  'org.apache.hive.hcatalog.data.JsonSerDe' 
WITH SERDEPROPERTIES ( 
  'paths'='make, serialNumber') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://pfm-og-blah-store-use2-s3-prod/violations'
TBLPROPERTIES (
  'transient_lastDdlTime'='1610571957')

MY s3 :

s3://pfm-og-blah-store-use2-s3-prod/violations/dt=2021-01-12-16/

and it contains a json file with a bunch of json in it:

pfm-og-barrel_grief_use2-kfh-prod-2-2021-01-12-16-06-07-0abdaa4f-92db-46ee-8c0c-2001e8b5512a

Upvotes: 0

Views: 839

Answers (2)

markthegrea
markthegrea

Reputation: 3841

TLDR - I didn't have the authority to run MSCK REPAIR TABLE dqme_violations_v1

The skinny:

So running the MSCK command showed no signs of error. It seemed to run and actually listed out the partitions it found that were NOT in the metastore. The "not" should have told me it was not adding them because it showed that each time). Athena really needs better error handling.

If you go to AWSGlue -> Tables -> View partitions, it will list the partitions it has found. Mine was empty.

I work for a large corporation and it has locked down lots of things. So I had to run some sql via a Lambda to add the partitions. We run a batch job once an hour that adds the NEXT hours partition to the table (yes the file is not created yet. Athena doesn't care.)

Upvotes: 1

Nicolas Busca
Nicolas Busca

Reputation: 1305

In this case you have a partition that looks like yyyy-MM-dd-HH. There's a much niftier way to handle partitions using partition projection. Once set up, there's no need to do any repair table or manual ADD PARTITION, you can set it up directly via DDL by adding:

...
TBLPROPERTIES (
  'transient_lastDdlTime'='1610571957',
projection.enabled='true',
projection.dt.type='date',
projection.dt.format='yyyy-MM-dd-HH',
projection.dt.range='NOW-30DAYS,NOW'
)

Upvotes: 0

Related Questions