Reputation: 2656
This is all according to this guide: http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/
hive> ADD JAR /home/hadoop/hive-serdes-1.0-SNAPSHOT.jar;
Added /home/hadoop/hive-serdes-1.0-SNAPSHOT.jar to class path
Added resource: /home/hadoop/hive-serdes-1.0-SNAPSHOT.jar
In /tmp/new
I have a file abc.json
which contains the following:
http://pastie.org/9504218
The CREATE EXTERNAL TABLE command runs properly, but it doesn't take in any data:
hive>
> CREATE EXTERNAL TABLE tweets (
> id BIGINT,
> created_at STRING,
> source STRING,
> favorited BOOLEAN,
> retweeted_status STRUCT<
> text:STRING,
> user:STRUCT<screen_name:STRING,name:STRING>,
> retweet_count:INT>,
> entities STRUCT<
> urls:ARRAY<STRUCT<expanded_url:STRING>>,
> user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>,
> hashtags:ARRAY<STRUCT<text:STRING>>>,
> text STRING,
> user STRUCT<
> screen_name:STRING,
> name:STRING,
> friends_count:INT,
> followers_count:INT,
> statuses_count:INT,
> verified:BOOLEAN,
> utc_offset:INT,
> time_zone:STRING>,
> in_reply_to_screen_name STRING
> )
> PARTITIONED BY (datehour INT)
> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
> LOCATION '/tmp/new';
OK
Time taken: 0.142 seconds
Select *:
hive> select * from tweets;
OK
Time taken: 0.392 seconds
What could be going on here?
Upvotes: 1
Views: 606
Reputation: 2876
So the issue is you have created an External table with partition. But you have not added the partition in Hive nor you have created the directory in HDFS that way.
Here are the steps you can follow:
1.) Run the create table statement.
2.) In the directory /tmp/new/ create a sub directory datehour=<some int value>, and then put your .json file inside this.
3.) Run alter table statement adding this partition to metadata:
alter table tweets add partition(datehour=<some int value>);
4.) Now run the select statement.
Hope it helps...!!!
Upvotes: 1