Reputation: 73
I am trying to load data from hive table(hivetable1) then making some modification to it using spark and again saving in another table(hivetable2) in hive. when I do select * from hivetable2, it shows me proper data but when I try to see the same file in hdfs it shows all junk char as below. When I try to export same data in postgres using Sqoop it appends the entire data in postgres table's single column.
Spark script:
spark = SparkSession \
.builder \
.appName("Python Spark SQL Hive integration example") \
.config("hive.metastore.uris", "thrift://localhost:9083") \
.config("spark.sql.catalogImplementation=hive") \
.enableHiveSupport() \
.getOrCreate()
df = spark.sql("select * from hivetable1")
df.write.format("hive").mode('overwrite').option("delimiter", "\t").saveAsTable("hivetable2")
Hdfs file data:
hadoop fs -cat /user/hive/warehouse/tb.db/hivetable2/part-0000
lnullunknownnullnullnull\N\N\N\Nnullnullnullnullnull0.00.0nullnull\Nnull\Nnullnullnullnullnull\Nnullnull\Nnullnullnull\Nnullnullnull\Nnullnull
Sqoop Export:
sqoop export --connect jdbc:postgresql://localhost:5432/postgres?stringtype=unspecified -m 1 --table test --export-dir /user/hive/warehouse/tb.db/hivetable2 \
--username test --password test --input-fields-terminated-by '\t'
Upvotes: 0
Views: 305
Reputation: 150
I would suggest to create a hive table seperately and append the data into the table. Secondly once done you can sqoop back to database.
df.write.mode("overwrite").insertInto("database_name.table", overwrite=True)
Upvotes: 0