Reputation: 91
While trying to run SQL query for PostgreSQL, instead of the column names from the table referred it it pushing down * to the database.
select
m.m_id,
cnt_c_no
from (
select
m_id
from pg_test_main.test1.table1
where
last_date >= '2019-01-01 00:00:00'
) as m
left join (
select
ci.m_id,
count(ci.c_no) as cnt_c_no
from (
select
m_id,
c_no
from pg_test.public.table2
) as ci
inner join (
select
c_no
from pg_test.public.table3
where
is_del = 'F'
) as c on ci.c_no = c.c_no
group by
ci.m_id
) as join1 on m.m_id = join1.m_id;
00-00 Screen
00-01 Project(m_id=[$0], cnt_c_no=[$1])
00-02 Project(m_id=[$0], cnt_c_no=[$2])
00-03 HashJoin(condition=[=($0, $1)], joinType=[left], semi-join: =[false])
00-05 Jdbc(sql=[SELECT "m_id" FROM "test1"."table1" WHERE "last_date" >= '2019-01-01 00:00:00' ])
00-04 Project(m_id0=[$0], cnt_c_no=[$1])
00-06 HashAgg(group=[{0}], cnt_c_no=[COUNT($1)])
00-07 Project(m_id=[$0], c_no=[$1])
00-08 HashJoin(condition=[=($1, $2)], joinType=[inner], semi-join: =[false])
00-10 Project(m_id=[$3], c_no=[$1])
00-12 Jdbc(sql=[SELECT * FROM "public"."table2" ])
00-09 Project(c_no0=[$0])
00-11 Project(c_no=[$0])
00-13 SelectionVectorRemover
00-14 Filter(condition=[=($60, 'F')])
00-15 Jdbc(sql=[SELECT * FROM "public"."table3" ])
As you can see, Jdbc Scan for table1 was using column names.
but, Jdbc Scan for table2 and table3 was not using column names. It pushed down * to the database.
How can I control jdbc scan so that it can push down colume names?
Apache Drill version is 1.16.0 (embedded-mode)
Upvotes: 1
Views: 149
Reputation: 661
I tried to reproduce it with MySQL on both Drill 1.17 and Drill 1.15, but for the query, similar to the query you have specified, all the query is pushed into the JDBC storage:
SELECT m.person_id,
cnt_c_no
FROM
(SELECT person_id
FROM mysql.`drill_mysql_test1`.person1
WHERE date_field >= '2019-01-01 00:00:00') AS m
LEFT JOIN
(SELECT ci.person_id,
count(ci.last_name) AS cnt_c_no
FROM
(SELECT person_id,
last_name
FROM mysql.`drill_mysql_test`.person) AS ci
INNER JOIN
(SELECT last_name
FROM mysql.`drill_mysql_test`.person2
WHERE boolean_field = 'F' ) AS c ON ci.last_name = c.last_name
GROUP BY ci.person_id) AS join1 ON m.person_id = join1.person_id
The plan for this query:
00-00 Screen
00-01 Project(person_id=[$0], cnt_c_no=[$1])
00-02 Jdbc(sql=[SELECT `t0`.`person_id`, `t5`.`cnt_c_no` FROM (SELECT `person_id` FROM `drill_mysql_test1`.`person1` WHERE `date_field` >= '2019-01-01 00:00:00') AS `t0` LEFT JOIN (SELECT `t1`.`person_id`, COUNT(`t1`.`last_name`) AS `cnt_c_no` FROM (SELECT `person_id`, `last_name` FROM `drill_mysql_test`.`person`) AS `t1` INNER JOIN (SELECT `last_name` FROM `drill_mysql_test`.`person2` WHERE `boolean_field` = 'F') AS `t3` ON `t1`.`last_name` = `t3`.`last_name` GROUP BY `t1`.`person_id`) AS `t5` ON `t0`.`person_id` = `t5`.`person_id` ])
Could you please provide CTAS for Postgres tables, so I will try to reproduce it again with specific data types. Also, if possible, please check whether this issue is still reproduced on Drill 1.17.
UPD: Comment under this answer helped to discover that this problem was caused by the following issue: https://issues.apache.org/jira/browse/DRILL-7340 and it will be resolved in Drill 1.18.0.
Upvotes: 1