venkata
venkata

Reputation: 99

how to create dataframe from hive external table

we like to create the dataframe on top of Hive external table and use the hive schema and data for the computation in spark level.

can we get the schema from the hive external table and use it as Dataframe schema.

Upvotes: 1

Views: 24854

Answers (5)

Amit kumar
Amit kumar

Reputation: 1

you can create dataframe with tour own column name into toDF()

df = spark.sql("select * from table").toDF(col1,col2)

Upvotes: 0

AJIT SONAWANE
AJIT SONAWANE

Reputation: 61

Load the data in data frame

df=sqlContext.sql("select * from hive_table")

Get schema with structTypes

df.schema

Get column Names of the Hive Table

df.columns

Get the column names with datatypes

df.dtypes

Upvotes: 1

Sandeep Singh
Sandeep Singh

Reputation: 7990

To access the Hive table from Spark use Spark HiveContext

import org.apache.spark.sql.hive.HiveContext;

val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc)
.
.
do other stuff..then
.
.

val data = sqlContext.sql("select * from hive_table");

here data will be your dataframe with schema of the Hive table.

Upvotes: 2

Raphael Roth
Raphael Roth

Reputation: 27373

The hive-metastore knows the schema of your tables and passes this information to spark. It does not matter whether the table is external or not:

val df = sqlContext.table(tablename)

where sqlContext is of type HiveContext. You can verify your schema with

df.printSchema

Upvotes: 10

Alex Naspo
Alex Naspo

Reputation: 2092

Spark with Hive enabled can do this out of the box. Please reference the docs.

val dataframe = spark.sql("SELECT * FROM table")
val schema = dataframe.schema

Upvotes: 2

Related Questions