user1265125
user1265125

Reputation: 2656

Unable to use JSON Serde in Hive

This is all according to this guide: http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/

hive> ADD JAR /home/hadoop/hive-serdes-1.0-SNAPSHOT.jar;
Added /home/hadoop/hive-serdes-1.0-SNAPSHOT.jar to class path
Added resource: /home/hadoop/hive-serdes-1.0-SNAPSHOT.jar

In /tmp/new I have a file abc.json which contains the following: http://pastie.org/9504218

The CREATE EXTERNAL TABLE command runs properly, but it doesn't take in any data:

hive> 
    > CREATE EXTERNAL TABLE tweets (
    >   id BIGINT,
    >   created_at STRING,
    >   source STRING,
    >   favorited BOOLEAN,
    >   retweeted_status STRUCT<
    >     text:STRING,
    >     user:STRUCT<screen_name:STRING,name:STRING>,
    >     retweet_count:INT>,
    >   entities STRUCT<
    >     urls:ARRAY<STRUCT<expanded_url:STRING>>,
    >     user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
    >     hashtags:ARRAY<STRUCT<text:STRING>>>,
    >   text STRING,
    >   user STRUCT<
    >     screen_name:STRING,
    >     name:STRING,
    >     friends_count:INT,
    >     followers_count:INT,
    >     statuses_count:INT,
    >     verified:BOOLEAN,
    >     utc_offset:INT,
    >     time_zone:STRING>,
    >   in_reply_to_screen_name STRING
    > ) 
    > PARTITIONED BY (datehour INT)
    > ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
    > LOCATION '/tmp/new';
OK
Time taken: 0.142 seconds

Select *:

hive> select * from tweets;
OK
Time taken: 0.392 seconds

What could be going on here?

Upvotes: 1

Views: 606

Answers (1)

Mukesh S
Mukesh S

Reputation: 2876

So the issue is you have created an External table with partition. But you have not added the partition in Hive nor you have created the directory in HDFS that way.

Here are the steps you can follow:

1.) Run the create table statement.
2.) In the directory /tmp/new/ create a sub directory datehour=<some int value>, and then put your .json file inside this.
3.) Run alter table statement adding this partition to metadata:
    alter table tweets add partition(datehour=<some int value>);
4.) Now run the select statement.

Hope it helps...!!!

Upvotes: 1

Related Questions