Lavanya varma
Lavanya varma

Reputation: 75

Loading local file into HDFS using hdfs put vs spark

Usecase is to load local file into HDFS. Below two are approaches to do the same , Please suggest which one is efficient.

Approach1: Using hdfs put command

hadoop fs -put /local/filepath/file.parquet   /user/table_nm/

Approach2: Using Spark .

spark.read.parquet("/local/filepath/file.parquet  ").createOrReplaceTempView("temp")
spark.sql(s"insert into table table_nm select * from temp")

Note:

  1. Source File can be in any format
  2. No transformations needed for file loading .
  3. table_nm is an hive external table pointing to /user/table_nm/

Upvotes: 0

Views: 606

Answers (1)

Ged
Ged

Reputation: 18128

Assuming that they are already built local .parquet files, using -put will be faster as there is no overhead of starting the Spark App.

If there are many files, there is simply still less work to do via -put.

Upvotes: 1

Related Questions