Reputation: 71
New programmer with SQL and Apache Drill here. I'm trying to take this SQL command from DB1:
SELECT screen_name, job_id, count(*) as counter
from twitter.mention t
WHERE t.job_id = 290
or t.job_id = 261
or t.job_id = 303
group by screen_name, job_id
order by counter desc
limit 60;
I'm trying to take this block and run it through Apache Drill, which I'm very new to. Of note, I'm using two databases, but only one variable is changing: screen_name (changing to from_user_name). It works fine in SQL, but it does not work in drill. Here is the drill code:
statement = """
SELECT from_user_name, job_id, count(*) as counter
from twitter.tweet t
WHERE t.job_id = 290
or t.job_id = 261
or t.job_id = 303
group by from_user_name, job_id
order by counter desc
limit 60;"""
drill = PyDrill(host='host_name', port=8047)
if not drill.is_active():
raise ImproperlyConfigured('Please run Drill first')
rows = drill.query(statement, timeout = 120)
df = rows.to_dataframe()
df.head(20)
Here is the error message:
TransportError: TransportError(500, '{\n "errorMessage" : "PARSE
ERROR: Encountered \\";\\" at line 9, column 9.\\nWas expecting one
of:\\n <EOF> \\n \\"OFFSET\\" ...\\n \\"FETCH\\" ...\\n
\\n\\nSQL Query \\nSELECT from_user_name, job_id, count(*) as
counter\\nfrom twitter.tweet t\\nWHERE t.job_id = 290\\nor t.job_id =
261\\nor t.job_id = 303\\ngroup by from_user_name, job_id\\norder by
counter desc\\nlimit 60;\\n ^\\n\\n\\n[Error Id: 78df6d24-686b-
496f-8795-9b3d21d75740 on c04.h-spark.cgi.missouri.edu:31010]"\n}')
Upvotes: 2
Views: 640
Reputation: 8239
This depends strongly on your mapping in the Storage Plugins section. How did you name the storage components? E.g. if you connected to a MySQL database, the storage plugin might be called mysql.
Now, if you want to access data from a specific table, you first need to tell Drill the storage plugin. How should Drill otherwise know in which connection this table resides?
So for example with a MySQL connection called mysql, a database "mydb", a table "twitterfeed" and a column "message" the SELECT
would look like this:
SELECT * FROM mysql.mydb.twitterfeed.message LIMIT 10;
This could easily be tested from within the Drill web UI which offers a very simple Query
dialog.
Other data sources can be joined in this manner, e.g. with a table in a MongoDB:
SELECT * FROM mysql.mydb.twitterfeed.message
JOIN mongodb.thedb.tablename ON mysql.mydb.twitterfeed.id = mongodb.thedb.tablename.externalid
LIMIT 10;
I did use SELECT *
only for example purposes. Obviously this should always hold the correct column names and no asterik. If you are only using one database you could also use USE
.
Upvotes: 0