Reputation: 529
I've created the following table in Hive:
CREATE TABLE mytable (..columns...) PARTITIONED BY (load_date string) STORED AS ...
And I'm trying to insert data to my table with spark as follow:
Dataset<Row> dfSelect = df.withColumn("load_date","15_07_2018");
dfSelect.write().mode("append").partitionBy("load_date").save(path);
And also make the following configuration:
sqlContext().setConf("hive.exec.dynamic.partition","true");
sqlContext().setConf("hive.exec.dynamic.partition.mode","nonstrict");
And after I make the write command I see on HDFS the directory /myDbPath/load_date=15_07_2018
, which contains the file that I've written but when I make query like:
show partitions mytable
or
select * from mytable where load_date="15_07_2018"
I get 0 records.
What happened and how can I fix this?
If I run the following command in Hue:
msck repair table mytable
I solve the problem, how can I do it in my code?
Upvotes: 1
Views: 647
Reputation: 31520
Hive stores a list of partitions for each table in its metastore. If, however, new partitions are directly added to HDFS (say by using hadoop fs -put command (or) .save..etc), the metastore (and hence Hive) will not be aware of these partitions unless the user runs either of the below commands
msck repair table <db.name>.<table_name>;
(or)
We can also add partition by using alter table statement by using this way we need to add each and every newly created partition to the table
alter table <db.name>.<table_name> add partition(load_date="15_07_2018") location <hdfs-location>;
Run either of the above statements and then check the data again for load_date="15_07_2018"
For more details refer these links add partitions and msck repair table
Upvotes: 1