sdinesh94
sdinesh94

Reputation: 1178

how to store Spark Dataframe in hive in uncompressed text format

I am trying to store dataframe into an external hive table. When I perform the following action:

 recordDF.write.option("path", "hdfs://quickstart.cloudera:8020/user/cloudera/hadoop/hive/warehouse/VerizonProduct").saveAsTable("productstoreHTable")

At the hdfs location where the table was supposed to be present instead I get this:

-rw-r--r-- 3 cloudera cloudera 0 2016-12-25 18:58 hadoop/hive/warehouse/VerizonProduct/_SUCCESS

-rw-r--r-- 3 cloudera cloudera 482 2016-12-25 18:58 hadoop/hive/warehouse/VerizonProduct/part-r-00000-0acdcc6d-893b-4e9d-b1d6-50bf02bea96a.snappy.parquet

-rw-r--r-- 3 cloudera cloudera 482 2016-12-25 18:58 hadoop/hive/warehouse/VerizonProduct/part-r-00001-0acdcc6d-893b-4e9d-b1d6-50bf02bea96a.snappy.parquet

-rw-r--r-- 3 cloudera cloudera 482 2016-12-25 18:58 hadoop/hive/warehouse/VerizonProduct/part-r-00002-0acdcc6d-893b-4e9d-b1d6-50bf02bea96a.snappy.parquet

-rw-r--r-- 3 cloudera cloudera 482 2016-12-25 18:58 hadoop/hive/warehouse/VerizonProduct/part-r-00003-0acdcc6d-893b-4e9d-b1d6-50bf02bea96a.snappy.parquet

How do I store it as uncompressed text format?

Thanks

Upvotes: 0

Views: 2838

Answers (3)

kolya_metallist
kolya_metallist

Reputation: 619

Try this .option("fileFormat", "texfile"). Look at Specifying storage format for Hive tables

Upvotes: 0

sunitha
sunitha

Reputation: 1528

The above solution with format csv, threw a warning "Couldn't find corresponding Hive SerDe for data source provider csv.". The table is not created in the desired way. One solution could be create an external table as below sqlContext.sql("CREATE EXTERNAL TABLE test(col1 int,col2 string) STORED AS TEXTFILE LOCATION '/path/in/hdfs'") .

Then dataFrame.write.format("com.databricks.spark.csv").option("header", "true").save("/path/in/hdfs")

Upvotes: 1

user7337271
user7337271

Reputation: 1712

You can add format option:

recordDF.write.option("path", "...").format("text").saveAsTable("...")

or

recordDF.write.option("path", "...").format("csv").saveAsTable("...")

Upvotes: 1

Related Questions