Reputation: 3071
I am trying to load a file from hdfs into hive using spark sql using below queries.
hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS src (value STRING)")
hiveContext.sql("LOAD DATA INPATH '/data/spark_test/kv1.txt' INTO TABLE src")
hiveContext.sql("FROM src SELECT *").collect().foreach(println)
What I find is,After the 2nd statement ie loading the file, I see the file in /apps/hive/warehouse/src/
but it is not found in /data/spark_test/kv1.txt
anymore. why is it so)?Spark version 1.6.1 used here.
Upvotes: 0
Views: 280
Reputation: 5834
This is default behavior of hive, when you load data to table using load data
command hive moves original source data to table location.
You can find same file inside table location, run below commands to find source file.
describe extended src; --copy location
hadoop fs -ls <location>
As the src
is external table so you may directly create external table on top of data instead of loading in next step.
hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS src (value STRING) location '/data/spark_test'")
Upvotes: 1