Sam
Sam

Reputation: 517

unable to insert into hive partitioned table from spark

I create an external partitioned table in hive. in the logs it shows numinputrows. that means the query is working and sending data. but when I connect to hive using beeline and query, select * or count(*) it's always empty.

def hiveOrcSetWriter[T](event_stream: Dataset[T])( implicit spark: SparkSession): DataStreamWriter[T] = {

    import spark.implicits._
    val hiveOrcSetWriter: DataStreamWriter[T] = event_stream
      .writeStream
      .partitionBy("year","month","day")
      .format("orc")
      .outputMode("append")
      .option("compression", "zlib")
      .option("path", _table_loc)
      .option("checkpointLocation", _table_checkpoint)

    hiveOrcSetWriter
  }

What can be the issue? I'm unable to understand.

Upvotes: 3

Views: 1431

Answers (2)

loneStar
loneStar

Reputation: 4010

msck repair table tablename

It give go and check the location of the table and adds partitions if new ones exits.

In your spark process add this step in order to query from hive.

Upvotes: 1

moriarty007
moriarty007

Reputation: 2224

Your streaming job is writing new partitions to the table_location. But the Hive metastore is not aware of this.

When you run a select query on the table, the Hive checks metastore to get list of table partitions. Since the information in Metastore is outdated, so the data don't show up in the result.

You need to run -

ALTER TABLE <TABLE_NAME> RECOVER PARTITIONS

command from Hive/Spark to update the metastore with new partition info.

Upvotes: 1

Related Questions