Ya Ko
Ya Ko

Reputation: 529

Write to dynamic partition Java-Spark

I've created the following table in Hive:

CREATE TABLE mytable (..columns...) PARTITIONED BY (load_date string) STORED AS ...

And I'm trying to insert data to my table with spark as follow:

Dataset<Row> dfSelect = df.withColumn("load_date","15_07_2018");
dfSelect.write().mode("append").partitionBy("load_date").save(path);

And also make the following configuration:

sqlContext().setConf("hive.exec.dynamic.partition","true");
sqlContext().setConf("hive.exec.dynamic.partition.mode","nonstrict");

And after I make the write command I see on HDFS the directory /myDbPath/load_date=15_07_2018, which contains the file that I've written but when I make query like:

show partitions mytable

or

select * from mytable where load_date="15_07_2018"

I get 0 records.

What happened and how can I fix this?

EDIT

If I run the following command in Hue:

msck repair table mytable

I solve the problem, how can I do it in my code?

Upvotes: 1

Views: 647

Answers (1)

notNull
notNull

Reputation: 31520

Hive stores a list of partitions for each table in its metastore. If, however, new partitions are directly added to HDFS (say by using hadoop fs -put command (or) .save..etc), the metastore (and hence Hive) will not be aware of these partitions unless the user runs either of the below commands

  1. Meta store check command (msck repair table)

msck repair table <db.name>.<table_name>;

(or)

  1. ALTER TABLE table_name ADD PARTITION commands on each of the newly added partitions.

We can also add partition by using alter table statement by using this way we need to add each and every newly created partition to the table

alter table <db.name>.<table_name> add partition(load_date="15_07_2018") location <hdfs-location>;

Run either of the above statements and then check the data again for load_date="15_07_2018"

For more details refer these links add partitions and msck repair table

Upvotes: 1

Related Questions