Dennis P
Dennis P

Reputation: 71

Presto and Hive

I'm trying to enable basic SQL querying of CSV files located in an s3 directory. Presto seemed like a natural fit (the files are 10s GB). As I went through the setup in Presto, I tried creating a table using the Hive connector. It was not clear to me if I only needed the hive metastore to save my table configurations in Presto, or if I have to create them in there first.

The documentation makes it seem that you can use Presto without having to CONFIGURE Hive, but using Hive syntax. Is that accurate? My experiences are that AWS S3 has not been able to connect.

Upvotes: 3

Views: 1479

Answers (3)

Ezra Justin Lee
Ezra Justin Lee

Reputation: 81

I know it's been a while, but if this question is still outstanding, have you considered using Spark? Spark connects easily with out-of-the-box methods and can query/process data living in S3/CSV formats.

Also, I'm curious: what solution did you end up implementing to resolve your issue?

Upvotes: 0

Sayat Satybald
Sayat Satybald

Reputation: 6590

It is not possible to use vanilla Presto to analyze data on S3 without Hive. Presto provides only distributed execution engine. However, it lacks metadata information about tables. Thus, Presto Coordinator needs Hive to retrieve table metadata to parse and execute a query.

However, you can use AWS Athena, which is managed Presto, to run queries on top of S3.

Another option, in recent 0.198 release Presto adds a capability to connect AWS Glue and retrieve table metadata on top of files in S3.

Upvotes: 1

Ezra Justin Lee
Ezra Justin Lee

Reputation: 81

Presto syntax is similar to Hive syntax. For most simple queries, the identical syntax would function in both. However, there are some key differences that make Presto and Hive not entirely the same thing. For example, in Hive, you might use LATERAL VIEW EXPLODE, whereas in Presto you'd use CROSS JOIN UNNEST. There are many such examples of nuanced syntactical differences between the two.

Upvotes: 1

Related Questions