Ryan
Ryan

Reputation: 71

SQL and apache drill

New programmer with SQL and Apache Drill here. I'm trying to take this SQL command from DB1:

SELECT screen_name, job_id, count(*) as counter
from twitter.mention t
WHERE t.job_id = 290
or t.job_id = 261
or t.job_id = 303
group by screen_name, job_id
order by counter desc
limit 60;

I'm trying to take this block and run it through Apache Drill, which I'm very new to. Of note, I'm using two databases, but only one variable is changing: screen_name (changing to from_user_name). It works fine in SQL, but it does not work in drill. Here is the drill code:

statement = """
SELECT from_user_name, job_id, count(*) as counter
from twitter.tweet t
WHERE t.job_id = 290
or t.job_id = 261
or t.job_id = 303
group by from_user_name, job_id
order by counter desc
limit 60;"""

drill = PyDrill(host='host_name', port=8047)

if not drill.is_active():
    raise ImproperlyConfigured('Please run Drill first')


rows = drill.query(statement, timeout = 120)

df = rows.to_dataframe()
df.head(20) 

Here is the error message:

TransportError: TransportError(500, '{\n  "errorMessage" : "PARSE 
ERROR: Encountered \\";\\" at line 9, column 9.\\nWas expecting one 
of:\\n    <EOF> \\n    \\"OFFSET\\" ...\\n    \\"FETCH\\" ...\\n    
\\n\\nSQL Query \\nSELECT from_user_name, job_id, count(*) as 
counter\\nfrom twitter.tweet t\\nWHERE t.job_id = 290\\nor t.job_id = 
261\\nor t.job_id = 303\\ngroup by from_user_name, job_id\\norder by 
counter desc\\nlimit 60;\\n        ^\\n\\n\\n[Error Id: 78df6d24-686b-
496f-8795-9b3d21d75740 on c04.h-spark.cgi.missouri.edu:31010]"\n}')

Upvotes: 2

Views: 640

Answers (1)

tobi6
tobi6

Reputation: 8239

This depends strongly on your mapping in the Storage Plugins section. How did you name the storage components? E.g. if you connected to a MySQL database, the storage plugin might be called mysql.

Now, if you want to access data from a specific table, you first need to tell Drill the storage plugin. How should Drill otherwise know in which connection this table resides?

So for example with a MySQL connection called mysql, a database "mydb", a table "twitterfeed" and a column "message" the SELECT would look like this:

SELECT * FROM mysql.mydb.twitterfeed.message LIMIT 10;

This could easily be tested from within the Drill web UI which offers a very simple Query dialog.

Other data sources can be joined in this manner, e.g. with a table in a MongoDB:

SELECT * FROM mysql.mydb.twitterfeed.message 
JOIN mongodb.thedb.tablename ON mysql.mydb.twitterfeed.id = mongodb.thedb.tablename.externalid
LIMIT 10;

I did use SELECT * only for example purposes. Obviously this should always hold the correct column names and no asterik. If you are only using one database you could also use USE.

Upvotes: 0

Related Questions