S.P
S.P

Reputation: 79

Problems with wordcount in hive

There was a problem using hive to do wordcount.

My hive command is like

select word, count(1) as count 
from (select explode(split(word, ' ' )) as word from note) w   
group by word 
order by count desc 
limit 5
;

Result:

the 20583
of  10388
     9479
and  7611
in   5226

9479 is the number of lines. How do I get rid of this?

Upvotes: 0

Views: 86

Answers (1)

David דודו Markovitz
David דודו Markovitz

Reputation: 44991

Change the split function to -

split(word,'\\s+')

(instead of a single space, a serious a white characters [ \t\n\x0B\f\r])

Upvotes: 1

Related Questions