Reputation: 740
I've seen from two sources that right now you cannot interact in any meaningful way with HIVE Transactional Tables from Spark.
Hive Transactional Tables are not readable by spark
I see Databricks has released a Transactional feature called Databricks Delta. Is it possible to now read HIVE Transactional Tables using this feature?
Upvotes: 2
Views: 7347
Reputation: 469
I don't see the point on stressing on just Spark for accessing Hive ACID.
Actually Spark relies on a host language, Python and Scala being the most popular choices.
You could use Hive ACID from Python with no issues, this is a very well proven integration.
Your data can reside on Spark dataframes or RDDs, but as long as you can transfer it to standard Python data structures, you can interoperate with Hive ACID directly from these.
Upvotes: 0
Reputation: 272
Nope. Not the Hive Transactional tables. You create a new type of table called Databricks Delta Table(Spark table of parquets) and leverage the Hive metastore to read/write to these tables.
Its a kind of External table but its more like data to schema. More of Spark and Parquet.
The solution for your problem might be to read the hive files and Impose the schema accordingly in a Databricks notebook and then save it as a databricks delta table.
like this : df.write.mode('overwrite').format('delta').save(/mnt/out/put/path)
You would still need to write a DDL pointing to that location.Just FYI DELTA table is Transactional.
Upvotes: 3