Reputation: 3841
I have loaded my data into S3 (Json format). I have executed "MSCK REPAIR TABLE dqme_violations_v1" I have created my table and it seems to be pointing to the correct place. What gives?
My table (via: show create table dqme_violations_v1):
CREATE EXTERNAL TABLE `dqme_violations_v1`(
`asset_id` string COMMENT 'from deserializer',
`make` string COMMENT 'from deserializer',
`serial_number` string COMMENT 'from deserializer',
`message_format` string COMMENT 'from deserializer',
`content_type` string COMMENT 'from deserializer',
`content` string COMMENT 'from deserializer',
`rule_id` string COMMENT 'from deserializer',
`ruleset` string COMMENT 'from deserializer',
`rule_type` string COMMENT 'from deserializer',
`error_type` string COMMENT 'from deserializer',
`error_name` string COMMENT 'from deserializer',
`error_criteria` string COMMENT 'from deserializer')
COMMENT 'The DQME OG Grief violation table'
PARTITIONED BY (
`dt` string)
ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe'
WITH SERDEPROPERTIES (
'paths'='make, serialNumber')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://pfm-og-blah-store-use2-s3-prod/violations'
TBLPROPERTIES (
'transient_lastDdlTime'='1610571957')
MY s3 :
s3://pfm-og-blah-store-use2-s3-prod/violations/dt=2021-01-12-16/
and it contains a json file with a bunch of json in it:
pfm-og-barrel_grief_use2-kfh-prod-2-2021-01-12-16-06-07-0abdaa4f-92db-46ee-8c0c-2001e8b5512a
Upvotes: 0
Views: 839
Reputation: 3841
TLDR - I didn't have the authority to run MSCK REPAIR TABLE dqme_violations_v1
The skinny:
So running the MSCK command showed no signs of error. It seemed to run and actually listed out the partitions it found that were NOT in the metastore. The "not" should have told me it was not adding them because it showed that each time). Athena really needs better error handling.
If you go to AWSGlue -> Tables -> View partitions, it will list the partitions it has found. Mine was empty.
I work for a large corporation and it has locked down lots of things. So I had to run some sql via a Lambda to add the partitions. We run a batch job once an hour that adds the NEXT hours partition to the table (yes the file is not created yet. Athena doesn't care.)
Upvotes: 1
Reputation: 1305
In this case you have a partition that looks like yyyy-MM-dd-HH
. There's a much niftier way to handle partitions using partition projection. Once set up, there's no need to do any repair table or manual ADD PARTITION
, you can set it up directly via DDL by adding:
...
TBLPROPERTIES (
'transient_lastDdlTime'='1610571957',
projection.enabled='true',
projection.dt.type='date',
projection.dt.format='yyyy-MM-dd-HH',
projection.dt.range='NOW-30DAYS,NOW'
)
Upvotes: 0