Neethu Lalitha
Neethu Lalitha

Reputation: 3071

Unable to see hdfs file after loading to hive using spark sql

I am trying to load a file from hdfs into hive using spark sql using below queries.

hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS src (value STRING)")
hiveContext.sql("LOAD DATA INPATH '/data/spark_test/kv1.txt' INTO TABLE src")
hiveContext.sql("FROM src SELECT *").collect().foreach(println)

What I find is,After the 2nd statement ie loading the file, I see the file in /apps/hive/warehouse/src/ but it is not found in /data/spark_test/kv1.txt anymore. why is it so)?Spark version 1.6.1 used here.

Upvotes: 0

Views: 280

Answers (1)

Rahul Sharma
Rahul Sharma

Reputation: 5834

This is default behavior of hive, when you load data to table using load data command hive moves original source data to table location.
You can find same file inside table location, run below commands to find source file.

describe extended src; --copy location
hadoop fs -ls <location> 

As the src is external table so you may directly create external table on top of data instead of loading in next step.

hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS src (value STRING) location '/data/spark_test'")

Upvotes: 1

Related Questions