Reputation: 31
I have a postgress table which has text column (detail). I have declared detail as STRING in Hive. It is getting imported successfully When i try to import it from SQOOP or SPark . However i am missing lot of data which is available in detail column and lot of empty rows are getting created in hive table.
Can anyone help me on this?
Ex: detail column has below data
line1 sdhfdsf dsfdsdfdsf dsfs
line2 jbdfv df ffdkjbfd
jbdsjbfds dsfsdfb dsfds
dfds dsfdsfds dsfdsdskjnfds
sdjfbdsfdsdsfds
Only "line1 sdhfdsf dsfdsdfdsf dsfs " is getting imported into hive table.
I can see empty rows for remaining lines.
Upvotes: 1
Views: 769
Reputation: 31
Here is the solution
SparkConf sparkConf = new SparkConf().setAppName("HiveSparkSQL");
SparkContext sc = new SparkContext(sparkConf);
HiveContext sqlContext= new HiveContext(sc);
sqlContext.setConf("spark.sql.parquet.binaryAsString","true");
String url="jdbc:postgresql://host:5432/dbname?user=**&password=***";
Map<String, String> options = new HashMap<String, String>();
options.put("url", url);
options.put("dbtable", "(select * from abc.table limit 50) as act1");
options.put("driver", "org.postgresql.Driver");
DataFrame jdbcDF = sqlContext.read().format("jdbc").options(options).load();
jdbcDF.write().format("parquet").mode(SaveMode.Append).saveAsTable("act_parquet");
Upvotes: 0
Reputation: 192
Hive cannot support multiple lines in text file formats. You must load this data into a binary file, Avro or Parquet, to retain newline characters. If you don't need to retain them then you can strip them with hive-drop-import-delims
Upvotes: 1