Omry Atia
Omry Atia

Reputation: 83

impala CREATE EXTERNAL TABLE and remove double quotes

i got data on CSV for example : "Female","44","0","0","Yes","Govt_job","Urban","103.59","32.7","formerly smoked"

i put it as hdfs with hdfs dfs put

and now i want to create external table from it on impala (not in hive) there is an option without the double quotes ?

this is what i run by impala-shell:

CREATE EXTERNAL TABLE IF NOT EXISTS test_test.test1_ext
( `gender` STRING,`age` STRING,`hypertension` STRING,`heart_disease` STRING,`ever_married` STRING,`work_type` STRING,`Residence_type` STRING,`avg_glucose_level` STRING,`bmi` STRING,`smoking_status` STRING ) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION "/user/test/tmp/test1"

Update 28.11

i managed to do it by create the external and then create a VIEW as select with case when concat() each col.

Upvotes: 1

Views: 2146

Answers (1)

ajdams
ajdams

Reputation: 2314

Impala uses the Hive metastore so anything created in Hive is available from Impala after issuing an INVALIDATE METADATA dbname.tablename. HOWEVER, to remove the quotes you need to use the Hive Serde library 'org.apache.hadoop.hive.serde2.OpenCSVSerde' and this is not accessible from Impala. My suggestion would be to do the following:

  1. Create the external table in Hive

    CREATE EXTERNAL TABLE IF NOT EXISTS test_test.test1_ext ( gender STRING, age STRING, hypertension STRING, heart_disease STRING, ever_married STRING, work_type STRING, Residence_type STRING, avg_glucose_level STRING, bmi STRING, smoking_status STRING ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( "separatorChar" = ",", "quoteChar" = """ )
    STORED AS TEXTFILE LOCATION "/user/test/tmp/test1"

  2. Create a managed table in Hive using CTAS

    CREATE TABLE mytable AS SELECT * FROM test_test.test1_ext;

  3. Make it available in Impala

    INVALIDATE METADATA db.mytable;

Upvotes: 2

Related Questions