Reputation: 1193
I am trying to understand the spark DataFrame API method called saveAsTable.
I have following question
df7.write.saveAsTable("t1")
, (assuming t1 did not exist earlier), will the newly created table be a hive table which can be read outside spark using Hive QL ?(I am new to big data processing, so pardon me if question is not phrased properly)
Upvotes: 2
Views: 372
Reputation: 167
Yes, you can do. You table can be partitioned by a column, but can not use bucketing (its a problem between spark and hive).
Upvotes: 0
Reputation: 1352
Yes. Newly created table will be hive table and can be queried from Hive CLI
(Only if the DataFrame is created from single input HDFS
path i.e. from non-partitioned single input HDFS
path).
Below is the documentation comment in DataFrameWriter.scala
class. Documentation link
When the DataFrame is created from a non-partitioned
HadoopFsRelation
with a single input path, and the data source provider can be mapped to an existing Hive builtin SerDe (i.e. ORC and Parquet), the table is persisted in a Hive compatible format, which means other systems like Hive will be able to read this table. Otherwise, the table is persisted in a Spark SQL specific format.
Upvotes: 0