Reputation: 33
I'm new here I am trying to do an analysis of tweets using Apache hive. I am able to retrieve tweets in csv.file
Then I simply create a table Create external table if not exists tweets3 ( id bigint, text string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde';
Then I use split function in text column to create another table.
Create table if not exists split_word as select id as id, split(text, ' ') as words from tweets3;
but when I run query select * from split_word this is what I see enter image description here words aren't separated
I am very new to this. please help
I guess maybe I should try to save tweets in json format ?
Upvotes: 1
Views: 26
Reputation: 7407
I have a better solution. You can have 100s of words in the tweet so you need to create many columns. Instead of columns, you can create rows with the words.
For ex, the string 'Air canada is bad...'
will look like this -
Air
canada
is
bad...
Solution -
with cte as (select split('Air canada is bad...',' ') as c)
select cte_ex words
from cte
LATERAL VIEW explode(c) exploded_table as cte_ex
Now, if you want you can easily pivot this data into different columns but i don't think it will be useful but its upto your requirement.
Upvotes: 1