Reputation: 221
I have a table with 200mil rows.
Lets call the table employee_internet_history
.
Row : employee_fullname || website || date || more data
.
The table have an index on employee_fullname
column.
I also have another table eu_employees
.
with 100 rows; each Row: employee_fullname || more data
.
I want to create a query to select top 3 websites surfed by each employee.
I am using Oracle Database
so i thought about using PL/SQL
to achieve this.
Currently i am using
declare
cursor top100workers is
select * from eu_employees
where rownum < 100;
begin
for worker in top100workers
LOOP
DBMS_OUTPUT.PUT_LINE(worker.employee_fullname ||' top 3 webpages:');
for TOP3 in (
SELECT /*+ parallel*/ website,
COUNT(website) AS num
from employee_internet_history
WHERE employee_internet_history.employee_fullname = worker.employee_fullname
group by website
order by num desc
)
LOOP
DBMS_OUTPUT.PUT_LINE('website = ' || TOP3.website || ' ,times surferd: '||top3.num);
end loop;
end LOOP;
end;
/
For each employee this query takes around 200 seconds.
And my real eu_employee table has over 8000 records.
Which Means it will take 19 days to count this using my way.
1) How can i speed things up?
2) Why dose it take so long?
If all the records of an employee are indexed, it should take O(1) to find them, and count them.
Also the queries are not depended on each other, 3) can i run few queries parallel ?
4) I saw there are several hints to run things at parallel mode, which one will fit best for my needs ?
5) is there a solution without using pl/sql
?
Upvotes: 0
Views: 1014
Reputation: 1269543
In general, a "set-based" approach (using a query) is going to be faster than using PL/SQL.
The following query does what you want:
select eih.*
from (select employee_id, website, count(*) as cnt,
row_number() over (partition by employee_id order by count(*) desc) as seqnum
from employee_internet_history eih
group by employee_id, website
) eih
where seqnum <= 3;
I'm not sure if you can make it run faster, because you do have to start by aggregating the data at the employee/website level. If you want more employee information, then join in eu_employees
.
By the way, using employee_fullname
for the join key is a really bad idea. People might change their names throughout their lives for various reasons.
Let me also add that your query would probably run much faster with an index on employee_internet_history(employee_fullname, website)
. You can also leave out the join
to the worker information. At least nothing in the question suggests that it is needed (unless it is used for filtering).
EDIT:
The performance is highly dependent on your hardware and memory. You can speed the query by using a subset of employees by joining them in:
select eih.*
from (select employee_id, website, count(*) as cnt,
row_number() over (partition by employee_id order by count(*) desc) as seqnum
from employee_internet_history eih join
(select ee.*
from eu_employees ee
where rownum < 100
) ee
on eih.employee_id = w.employee_id
group by employee_id, website
) eih
where seqnum <= 3;
Upvotes: 4