bit
bit

Reputation: 437

Import mixed data (string and array of strings) with hive

I am trying to import with hive some data of this type:

2015-0-6,value1

2015-0-9,value1,value2,value3

I tried in those ways

CREATE TABLE fields (timestamp STRING, fields_array ARRAY<STRING>)
ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n';

CREATE TABLE fields (timestamp STRING, fields_string STRING)
ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n';

but only timestamp and first value are stored in the table. Eg.:

2015-0-6 value1

2015-0-9 value1

Why?

Upvotes: 1

Views: 492

Answers (2)

mat77
mat77

Reputation: 436

Try using the SerDe explained in this blog. I think its close to what your requirement is. https://bigdatamusings.wordpress.com/2014/08/

Upvotes: 1

Tab
Tab

Reputation: 309

You need to define a python script such as this:

import sys

for line in sys.stdin:
    splitLine=line.strip().split(',', 1)
    print "%s\t%s" % (splitLine[0], splitLine[1])

And in your hql type

CREATE TABLE fields (timestamp STRING, fields_string STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

ADD FILE /your_script_path/script.py;

INSERT OVERWRITE TABLE fields SELECT TRANSFORM(line) USING 'python script.py' AS timestamp, fields FROM your_table_in_which_you_have_one_column_called_LINE ;

Upvotes: 1

Related Questions