user3013086
user3013086

Reputation: 43

Hadoop Hive SQL: Create External Table from an oddly formatted file

At the moment I have an initial system working which reads in a file with each line format looking like this:

REVISION 12 30364918 Anarchism 2005-12-06T17:44:47Z RJII 141644

with this code:

CREATE EXTERNAL TABLE mytable(type STRING, aid BIGINT, rid BIGINT, title STRING, ts STRING, uname STRING, uid STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION '/my/local/path/to/file';

But now I have a file with each line having a format like this:

2001-04-29T15:43:48Z    [10656,251129]

(note that there is a tab after the timestamp)

And I have no idea how pull data from it. I've looked online for suggestions but I can't seem to find anything.

If theoretically I I would want to get from this

ts STRING -> 2001-04-29T15:43:48Z
aid BIGINT -> 10656
rid BIGINT -> 251129

Upvotes: 1

Views: 517

Answers (1)

Jerome Banks
Jerome Banks

Reputation: 1630

Looks like the second field is a JSON array. You can create a table mapping onto the actual underlying structure, and define a view to extract the values. The "from_json" and "json_split" UDFs from Brickhouse (http://github.com/klout/brickhouse ) can parse the JSON for you

CREATE EXTERNAL TABLE mytable(datestr STRING, jsonArray STRING) 
ROW FORMAT FIELDS DELIMITED BY '\t';

CREATE VIEW myview AS
SELECT datestr, numArr[0] as aid, numArr[1] as rid
FROM ( SELECT datestr, from_json( jsonArray, array( cast(1 as bigint) ) ;

Upvotes: 1

Related Questions