Sam
Sam

Reputation: 517

Select count(*) issue with hive and spark

I get the correct count after I run the ANALYZE statement. But my problem is, it needs to be run every time of the count is updated. Technically I should be able to update the count for the same partition.

But it returns the same count if I don't execute the ANALYZE statement.

This is the query I execute for the count to be updated.

ANALYZE TABLE bi_events_identification_carrier_sam PARTITION(year, month, day) COMPUTE STATISTICS;

And executing is not convenient at all. any ideas?

Upvotes: 0

Views: 2004

Answers (1)

moriarty007
moriarty007

Reputation: 2224

Your count(*) query is using stats to get the result.

If you are using spark to write data, then you can set spark.sql.statistics.size.autoUpdate.enabled to true. This makes sure that Spark updates the table stats automatically after the write is done.

If you are using Hive, you can set set hive.stats.autogather=true;.

Once you enable these settings, then the write query will automatically update the stats and the subsequent read query will work fine.

Upvotes: 2

Related Questions