Reputation: 424
I am trying to create a table in hive 3.0 using the following schema i found online:
CREATE TABLE tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT< text : STRING, user : STRUCT<screen_name : STRING,name : STRING>, retweet_count : INT>,
entities STRUCT< urls : ARRAY<STRUT<expanded_url : STRING>>,
user_mentions : ARRAY<STRUCT<screen_name : STRING,name : STRING>>,
hashtags : ARRAY<STRUCT<text : STRING>>>,
text STRING,
user STRUCT< screen_name : STRING, name : STRING, friends_count : INT, followers_count : INT, statuses_count : INT, verified : BOOLEAN, utc_offset : INT, time_zone : STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JSONSerDe';
when i press enter the NoViableAltException. I am using hive first time have no experience can someone tell me what is wrong with the schema?
Upvotes: 1
Views: 130
Reputation: 31540
User is Reserved keyword in case if we are using keywords in hive then we need to enclose the keyword with `(backticks)
Example:
`user`
Try with below create table statement
CREATE TABLE tweets (
id BIGINT,
created_at STRING,
source STRING,
favorited BOOLEAN,
retweeted_status STRUCT< text : STRING, `user` : STRUCT<screen_name : STRING,name : STRING>, retweet_count : INT>,
entities STRUCT< urls : ARRAY<STRUCT<expanded_url : STRING>>,
user_mentions : ARRAY<STRUCT<screen_name : STRING,name : STRING>>,
hashtags : ARRAY<STRUCT<text : STRING>>>,
text STRING,
`user` STRUCT< screen_name : STRING, name : STRING, friends_count : INT, followers_count : INT, statuses_count : INT, verified : BOOLEAN, utc_offset : INT, time_zone : STRING>,
in_reply_to_screen_name STRING
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
Location '/user/flume/tweets/';
I am able to create table with above ddl:
desc tweets;
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+--+
| col_name | data_type | comment |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+--+
| id | bigint | from deserializer |
| created_at | string | from deserializer |
| source | string | from deserializer |
| favorited | boolean | from deserializer |
| retweeted_status | struct<text:string,user:struct<screen_name:string,name:string>,retweet_count:int> | from deserializer |
| entities | struct<urls:array<struct<expanded_url:string>>,user_mentions:array<struct<screen_name:string,name:string>>,hashtags:array<struct<text:string>>> | from deserializer |
| text | string | from deserializer |
| user | struct<screen_name:string,name:string,friends_count:int,followers_count:int,statuses_count:int,verified:boolean,utc_offset:int,time_zone:string> | from deserializer |
| in_reply_to_screen_name | string | from deserializer |
+--------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+--+
UPDATE:
Hive works as schema on read when we run select statement hive looks for files in the directory where table is pointing(/user/hive/warehouse/tweets/) to then read those data as per your ddl statement but in this case data is not present in the directory so select statement is not returning any records.
To Fix this issue:
Option1. Move the data from /user/flume/tweets/
to /user/hive/warehouse/tweets/
directory then you are able to select the data from the table.
`hadoop fs -mv /user/flume/tweets/ /user/hive/warehouse/tweets/`
(or)
Option2. We need to create hive table on top of /user/flume/tweets/
this directory then you are able to see the data in tweets table(use the above create table statement for that).
Upvotes: 1