Reputation: 99
we like to create the dataframe on top of Hive external table and use the hive schema and data for the computation in spark level.
can we get the schema from the hive external table and use it as Dataframe schema.
Upvotes: 1
Views: 24854
Reputation: 1
you can create dataframe with tour own column name into toDF()
df = spark.sql("select * from table").toDF(col1,col2)
Upvotes: 0
Reputation: 61
Load the data in data frame
df=sqlContext.sql("select * from hive_table")
Get schema with structTypes
df.schema
Get column Names of the Hive Table
df.columns
Get the column names with datatypes
df.dtypes
Upvotes: 1
Reputation: 7990
To access the Hive table from Spark use Spark HiveContext
import org.apache.spark.sql.hive.HiveContext;
val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc)
.
.
do other stuff..then
.
.
val data = sqlContext.sql("select * from hive_table");
here data
will be your dataframe with schema of the Hive table.
Upvotes: 2
Reputation: 27373
The hive-metastore knows the schema of your tables and passes this information to spark. It does not matter whether the table is external or not:
val df = sqlContext.table(tablename)
where sqlContext
is of type HiveContext
. You can verify your schema with
df.printSchema
Upvotes: 10
Reputation: 2092
Spark with Hive enabled can do this out of the box. Please reference the docs.
val dataframe = spark.sql("SELECT * FROM table")
val schema = dataframe.schema
Upvotes: 2