Pitchaporn Keenaphn
Pitchaporn Keenaphn

Reputation: 33

Split words in Hive table

I'm new here I am trying to do an analysis of tweets using Apache hive. I am able to retrieve tweets in csv.file

Then I simply create a table Create external table if not exists tweets3 ( id bigint, text string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde';

enter image description here

Then I use split function in text column to create another table.

Create table if not exists split_word as select id as id, split(text, ' ') as words from tweets3;

but when I run query select * from split_word this is what I see enter image description here words aren't separated

I am very new to this. please help

I guess maybe I should try to save tweets in json format ?

Upvotes: 1

Views: 26

Answers (1)

Koushik Roy
Koushik Roy

Reputation: 7407

I have a better solution. You can have 100s of words in the tweet so you need to create many columns. Instead of columns, you can create rows with the words.

For ex, the string 'Air canada is bad...' will look like this -

Air
canada
is
bad...

Solution -

with cte as (select split('Air canada is bad...',' ') as c)
select cte_ex words
from cte
LATERAL VIEW explode(c) exploded_table as cte_ex
  1. First we split the whole tweet into array of words like your solution.
  2. Then explode the array to get individual words in different row.

Now, if you want you can easily pivot this data into different columns but i don't think it will be useful but its upto your requirement.

enter image description here

Upvotes: 1

Related Questions