Reputation: 4661
I am rather new to Hadoop and Hive, and would like an example of something that could be easily done with Hadoop but for which hive is not a good fit.
Upvotes: 0
Views: 405
Reputation: 306
TF-IDF can be computed using Apache Hive with a Hivemall extension. https://github.com/myui/hivemall/wiki/TFIDF-calculation
To compute TF-IDF, 2 views and 1 query are just required. Easy!
Upvotes: 1
Reputation: 26
Everything that is not a "relational workload" (e.g. stuff you could also do with a SQL database) is not very well-suited for Hive. There is probably always a way to do it also with Hive (primarily because UDFs are available) but it would not be "easily".
You're differentiating between "Hadoop" and "Hive". However, "Hadoop" is a rather general term: It could mean "HDFS" (the distributed file system), "YARN" (the resource manager) or "Hadoop" as an implementation of the "Map Reduce" algorithm suggested by Google. I assume you refer to "Map Reduce" when comparing Hadoop and Hive.
I would say computing a page-rank with MapReduce is probably quite annoying with Hive. Another example would be computing TF-IDF with MapReduce.
Upvotes: 1