StackAddict
StackAddict

Reputation: 423

Upload .csv data in Hive which is in enclosed format

My .csv file is in an enclosed format.

    "13","9827259163","0","D","2"
    "13","9827961481","0","D","2"
    "13","9827202228","0","A","2"
    "13","9827529897","0","A","2"
    "13","9827700249","0","A","2"
    "12","9883219029","0","A","2"
    "17","9861065312","0","A","2"
    "17","9861220761","0","D","2"
    "13","9827438384","0","A","2"
    "13","9827336733","0","D","2"
    "13","9827380905","0","D","2"
    "13","9827115358","0","D","2"
    "17","9861475884","0","D","2"
    "17","9861511646","0","D","2"
    "17","9861310397","0","D","2"
    "13","9827035035","0","A","2"
    "13","9827304969","0","D","2"
    "13","9827355786","0","A","2"
    "13","9827702373","0","A","2"

Like it is in mysql, I have tried using "enclosed" keyword as follows..

CREATE EXTERNAL TABLE dnd (ServiceAreaCode varchar(50), PhoneNumber varchar(15), Preferences varchar(15), Opstype varchar(15), PhoneType varchar(10))
ROW FORMAT DELIMITED
        FIELDS TERMINATED BY ',' ENCLOSED BY '"'
        LINES TERMINATED BY '\n'
LOCATION '/dnd';

But, it is giving an error as follows...

NoViableAltException(26@[1704:103: ( tableRowFormatMapKeysIdentifier )?])
    at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
    at org.antlr.runtime.DFA.predict(DFA.java:144)
    at org.apache.hadoop.hive.ql.parse.HiveParser.rowFormatDelimited(HiveParser.java:30427)
    at org.apache.hadoop.hive.ql.parse.HiveParser.tableRowFormat(HiveParser.java:30662)
    at org.apache.hadoop.hive.ql.parse.HiveParser.createTableStatement(HiveParser.java:4683)
    at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:2144)
    at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1398)
    at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1036)
    at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)
    at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:404)
    at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
    at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:359)
    at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:456)
    at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:466)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:748)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
FAILED: ParseException line 5:33 cannot recognize input near 'ENCLOSED' 'BY' ''"'' in serde properties specification

Is there a way I can directly import this file ?? Thanks in advance.

Upvotes: 0

Views: 2198

Answers (3)

dinesh rajput
dinesh rajput

Reputation: 114

Hey I did it quoted csv data in hive table: first download csv serde(I downloaded csv-serde-1.1.2.jar) Then

hive>add jar /opt/hive-1.1.1/lib/csv-serde-1.1.2.jar;
Hive>create table t1(schema) row format serde 'com.bizo.hive.serde.csv.CSVSerde' with serdeproperties ("separatorChar" = ",") LOCATION '/user/hive/warehouse/dwb/ot1/';

Then we have to add serde in the hive-site.xml as below mentioned, so that we can query table from hive-shell.

<property><name>hive.aux.jars.path</name><value>hdfs://master-ip:54310/hive-serde/csv-serde-1.1.2.jar</value></property>

Upvotes: 1

Rengasamy
Rengasamy

Reputation: 1043

Find another way. The solution is serde. Please download serde jar using this link : https://github.com/downloads/IllyaYalovyy/csv-serde/csv-serde-0.9.1.jar

then follow below steps using hive prompt :

add jar path/to/csv-serde.jar;

create table dnd (ServiceAreaCode varchar(50), PhoneNumber varchar(15), Preferences varchar(15), Opstype varchar(15), PhoneType varchar(10))
row format serde 'com.bizo.hive.serde.csv.CSVSerde'
with serdeproperties(
"separatorChar" = "\,",
"quoteChar" = "\"")
stored as textfile
;

and then load data from your given path using below query:

load data local inpath 'path/xyz.csv' into table dnd; and then run :

select * from dnd;

Upvotes: 1

Suresh Ram
Suresh Ram

Reputation: 1034

In hive we can use jar file to retrieve the data which is enclosed in double quotes.

For your problem please refer this link :

http://stackoverflow.com/questions/21156071/why-dont-hive-have-fields-enclosed-by-like-in-mysql

Upvotes: 0

Related Questions