Reputation: 4739
I am using Glue as my hive meta store. I have a hourly job that writes files every hour to a registered partition.
Table definition:
CREATE EXTERNAL TABLE table_name (
column_1 STRING,
column_2 STRING
)
PARTITIONED BY (process_date DATE)
STORED AS PARQUET
LOCATION "s3://bucket/table_name/";
spark.sql("ALTER TABLE table_name ADD IF NOT EXISTS partition(process_date='2019-11-13')
LOCATION 's3://bucket/table_name/process_date=2019-11-13'")
The s3 location for that partitions and part files are
s3://bucket/table_name/process_date=2019-11-13/hour=00/part-01.parquet
s3://bucket/table_name/process_date=2019-11-13/hour=00/part-02.parquet
s3://bucket/table_name/process_date=2019-11-13/hour=01/part-01.parquet
s3://bucket/table_name/process_date=2019-11-13/hour=01/part-02.parquet
I understand if I add hour=00
and hour=01
to the partition location it will be work in spark sql. But with this way the data is queryable via Hive but not through spark sql.
I've also tried adding the these confs to my spark-shell, but no luck.
"spark.hadoop.mapreduce.input.fileinputformat.input.dir.recursive=true"
"spark.hadoop.hive.mapred.supports.subdirectories=true"
Upvotes: 0
Views: 1915
Reputation: 851
Tested the scenario by creating a table similar to yours and the config below worked for me:
First set:
sqlContext.setConf("spark.sql.hive.convertMetastoreParquet", "false")
Then this:
sqlContext.setConf("mapred.input.dir.recursive","true");
sqlContext.setConf("spark.sql.parquet.binaryAsString", "true")
You can read more here: [1] https://home.apache.org/~pwendell/spark-nightly/spark-branch-2.2-docs/latest/sql-programming-guide.html#hive-metastore-parquet-table-conversion
Upvotes: 1
Reputation: 851
I think what you have done is you enabled the Glue catalog in the hive-site.xml but not in spark-hive-site.xml.
Your classification should also have the section below:
[
{
"Classification": "spark-hive-site",
"Properties": {
"hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
}
},
]
ref: [1] https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html
Upvotes: 0