Aravind Reddy
Aravind Reddy

Reputation: 31

How to load Postgress "Text" data type into HIVE

I have a postgress table which has text column (detail). I have declared detail as STRING in Hive. It is getting imported successfully When i try to import it from SQOOP or SPark . However i am missing lot of data which is available in detail column and lot of empty rows are getting created in hive table.

Can anyone help me on this?

Ex: detail column has below data

line1 sdhfdsf dsfdsdfdsf dsfs
line2 jbdfv df ffdkjbfd
jbdsjbfds dsfsdfb dsfds
dfds dsfdsfds dsfdsdskjnfds
sdjfbdsfdsdsfds

Only "line1 sdhfdsf dsfdsdfdsf dsfs " is getting imported into hive table.

I can see empty rows for remaining lines.

Upvotes: 1

Views: 769

Answers (2)

Aravind Reddy
Aravind Reddy

Reputation: 31

Here is the solution

 SparkConf sparkConf = new SparkConf().setAppName("HiveSparkSQL");
    SparkContext sc = new SparkContext(sparkConf);

    HiveContext sqlContext= new HiveContext(sc);
    sqlContext.setConf("spark.sql.parquet.binaryAsString","true");

    String url="jdbc:postgresql://host:5432/dbname?user=**&password=***";

    Map<String, String> options = new HashMap<String, String>();
    options.put("url", url);
    options.put("dbtable", "(select * from abc.table limit 50) as act1");
    options.put("driver", "org.postgresql.Driver");

    DataFrame jdbcDF  = sqlContext.read().format("jdbc").options(options).load(); 
    jdbcDF.write().format("parquet").mode(SaveMode.Append).saveAsTable("act_parquet");

Upvotes: 0

Nick Burns
Nick Burns

Reputation: 192

Hive cannot support multiple lines in text file formats. You must load this data into a binary file, Avro or Parquet, to retain newline characters. If you don't need to retain them then you can strip them with hive-drop-import-delims

Upvotes: 1

Related Questions