Reputation: 437
I am trying to import with hive some data of this type:
2015-0-6,value1
2015-0-9,value1,value2,value3
I tried in those ways
CREATE TABLE fields (timestamp STRING, fields_array ARRAY<STRING>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';
CREATE TABLE fields (timestamp STRING, fields_string STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';
but only timestamp and first value are stored in the table. Eg.:
2015-0-6 value1
2015-0-9 value1
Why?
Upvotes: 1
Views: 492
Reputation: 436
Try using the SerDe explained in this blog. I think its close to what your requirement is. https://bigdatamusings.wordpress.com/2014/08/
Upvotes: 1
Reputation: 309
You need to define a python script such as this:
import sys
for line in sys.stdin:
splitLine=line.strip().split(',', 1)
print "%s\t%s" % (splitLine[0], splitLine[1])
And in your hql type
CREATE TABLE fields (timestamp STRING, fields_string STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
ADD FILE /your_script_path/script.py;
INSERT OVERWRITE TABLE fields SELECT TRANSFORM(line) USING 'python script.py' AS timestamp, fields FROM your_table_in_which_you_have_one_column_called_LINE ;
Upvotes: 1