bhomass
bhomass

Reputation: 3572

can't get hive to accept json file

I am following a simple hive json serde tutorial, but can't get hive to accept a json file that looks totally correct.

{
"id": 596344698102419456,
"created_at": "MonApr0101: 32: 06+00002013",
"source": "<ahref="http: //google.com"rel="nofollow">RihannaQuotes</a>",
"favorited": False
}

CREATE EXTERNAL TABLE tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
LOCATION '/user/flume/tweets';

after loading data, it says there is 0 rows Table default.tweets stats: [numFiles=1, numRows=0, totalSize=166, rawDataSize=0]

and select * from tweets; get failed with exception

java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for OBJECT (from [Source: java.io.StringReader@45377ac1; line: 1, column: 0]) at [Source: java.io.StringReader@45377ac1; line: 1, column: 3]

Did I do anything wrong?

Upvotes: 2

Views: 3449

Answers (3)

Ajith Kumara
Ajith Kumara

Reputation: 17

I changed the Serde in my case and it worked

eg:

ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ("ignore.malformed.json" = "true") 

Upvotes: 0

bhomass
bhomass

Reputation: 3572

All it is, you must have the entire record on a single line, with no \n embedded.

{ "id": 596344698102419456, "created_at": "MonApr0101: 32: 06+00002013", "source": "blank", "favorited": false }

This worked like a charm.

Upvotes: 8

frb
frb

Reputation: 3798

The problem is at this part of the Json:

"source": "<ahref="http: //google.com"rel="nofollow">RihannaQuotes</a>",

From the Json parsing point of view, that field's value ends at the second quote, i.e., it is interpreting:

"source": "<ahref="

And the rest is "garbage". Any online parser will confirm this.

You must escape the quotes within the script, this way:

{
    "id": 596344698102419456,
    "created_at": "MonApr0101: 32: 06+00002013",
    "source": "<a href=\"http://google.com\"rel=\"nofollow\">RihannaQuotes</a>",
    "favorited": false
}

Upvotes: 1

Related Questions