Reputation: 79
There was a problem using hive to do wordcount.
My hive command is like
select word, count(1) as count
from (select explode(split(word, ' ' )) as word from note) w
group by word
order by count desc
limit 5
;
Result:
the 20583
of 10388
9479
and 7611
in 5226
9479 is the number of lines. How do I get rid of this?
Upvotes: 0
Views: 86
Reputation: 44991
Change the split function to -
split(word,'\\s+')
(instead of a single space, a serious a white characters [ \t\n\x0B\f\r]
)
Upvotes: 1