Reputation: 13
I am using spark.readStream
to read data from Kafka and running an explode on the resulting dataframe.
I am trying to save the result of the explode in a Hive table and I am not able to find any solution for that.
I tried the following method but it doesn't work (it runs but I don't see any new partitions created)
val query = tradelines.writeStream.outputMode("append")
.format("memory")
.option("truncate", "false")
.option("checkpointLocation", checkpointLocation)
.queryName("tl")
.start()
sc.sql("set hive.exec.dynamic.partition.mode=nonstrict;")
sc.sql("INSERT INTO TABLE default.tradelines PARTITION (dt) SELECT * FROM tl")
Upvotes: 0
Views: 955
Reputation: 191983
Check HDFS for the dt
partitions on the file system
You need to run MSCK REPAIR TABLE
on the hive table to see new partitions.
If you aren't doing anything special with Spark, then it's worth pointing out that Kafka Connect HDFS is capable of registering Hive partitions directly from Kafka.
Upvotes: 1