Reputation: 4225
With Reference to one of the question answer before, Getting counts/totals at each level of a hierarchical query using CONNECT BY
DIRS
====
DIR_ID
PARENT_DIR_ID
DIR_NAME
FILES
=====
FILE_ID
FILE_NAME
DIR_ID
FILE_SIZE
If DIRS contains:
DIR_ID PARENT_DIR_ID DIR_NAME
====== ============= ========
1 ROOT
2 1 DIR1_1
3 1 DIR1_2
4 2 DIR2_1
5 2 DIR2_2
Out Out is shown below.
FILE_ID FILE_NAME DIR_ID FILE_SIZE
======= ========= ====== =========
1 test1.txt 5 100
2 test2.txt 5 200
3 test5.txt 5 50
4 test3.txt 3 300
5 test4.txt 3 300
6 test6.txt 4 100
I want a query that returns the path along with the number of files in or below each node in the hierarchy. Basically a rollup of the number of files. So the query result would look something like:
Path File_Count File_Size
===== =========== --------
/ROOT 6 1050
/ROOT/DIR1_1 4 450
/ROOT/DIR1_1/DIR2_1 1 100
/ROOT/DIR1_1/DIR2_2 3 350
/ROOT/DIR1_2 2 600
1 select sys_connect_by_path(dp.dir_name, '/') path
2 ,(select count(file_id)
3 from dirs dc
4 ,files f
5 where f.dir_id(+) = dc.dir_id
6 connect by prior dc.dir_id = dc.parent_dir_id
7 start with dc.dir_id = dp.dir_id
8 ) count
9 from dirs dp
10 connect by prior dp.dir_id = dp.parent_dir_id
11* start with dp.parent_dir_id is null
How can i also get count of File size column next to size column.
This Query is doing fine except sum of the count of file size.
(select count(file_id)
from dirs dc
,files f
where f.dir_id(+) = dc.dir_id
connect by prior dc.dir_id = dc.parent_dir_id
start with dc.dir_id = dp.dir_id
) count,
(select sum(file_size)
from dirs dc
,files f
where f.dir_id(+) = dc.dir_id
connect by prior dc.dir_id = dc.parent_dir_id
start with dc.dir_id = dp.dir_id
) size
from dirs dp
connect by prior dp.dir_id = dp.parent_dir_id
start with dp.parent_dir_id is null
I modified the query to get size of files of each level but no success. Any idea what is missing?
All i need is the count of the File_Size column as shown above.
Upvotes: 1
Views: 1630
Reputation: 3126
I thought up multiple solutions, might as well list them all here.
Subquery approach
This is what you use - list every directory, do two recursive subqueries for each directory. I basically repeated what you wrote here, and it works as expected. Your error was to use reserved words count
and size
for column names.
select sys_connect_by_path(dp.dir_name, '/') path,
(select count(file_id)
from dirs dc, files f
where f.dir_id(+) = dc.dir_id
connect by prior dc.dir_id = dc.parent_dir_id
start with dc.dir_id = dp.dir_id
) cnt,
(select sum(file_size)
from dirs dc, files f
where f.dir_id(+) = dc.dir_id
connect by prior dc.dir_id = dc.parent_dir_id
start with dc.dir_id = dp.dir_id
) sz
from dirs dp
connect by prior dp.dir_id = dp.parent_dir_id
start with dp.parent_dir_id is null;
However, on larger datasets this can be quite slow. There are other ways.
Straightforward join approach
This one is not very efficient and has some restraints, but seems fairly easy to understand. It executed a bit faster than subquery for me.
1) Get full directory hierarchy
2) Join files against (1) to find out full dirname for each file
3) Join (1) and (2) on file_dirname like dirname||'%'
to get a list of files which belong to specified directory or any of its subdirectories, then group by (1).dir_id
and get your sums.
Note that dirnames must all end with '/' for this to work properly; also, directory full path must be unique.
It will look something like that:
with dirs_r as
(select sys_connect_by_path(dp.dir_name, '/')||'/' path, dir_id
from dirs dp
connect by prior dp.dir_id = dp.parent_dir_id
start with dp.parent_dir_id is null
)
select dirs_r.path, count(*), sum(files_q.file_size)
from dirs_r,
(select d.path, f.file_size
from files f, dirs_r d
where d.dir_id = f.dir_id) files_q
where files_q.path like dirs_r.path||'%'
group by dirs_r.path
order by dirs_r.path
Reverse approach:
This might be a bit harder to understand, but is much more efficient according to explain plan, and it also only uses IDs, so it's more reliable. It also executed much faster than both of the above.
1) Build a reverse directory hierarchy for every directory using dp.dir_id = prior dp.parent_dir_id
and connect_by_root(dir_id)
. This way, you get rows for each directory id and all its parent directories.
2) Join (1) with files, get rows for each parent directory of each file.
3) Join (2) with direct hierarchy on dir_id and then group.
select dirs_r.path, count(*), sum(filedir.file_size) from
(select sys_connect_by_path(dp.dir_name, '/') path, dir_id
from dirs dp
connect by prior dp.dir_id = dp.parent_dir_id
start with dp.parent_dir_id is null
) dirs_r,
(select f.file_size, dirs_reverse.dir_id dir_id
from files f,
(select connect_by_root(dp.dir_id) inner_dir_id, dp.dir_id
from dirs dp
connect by dp.dir_id = prior dp.parent_dir_id
) dirs_reverse
where dirs_reverse.inner_dir_id = f.dir_id) filedir
where dirs_r.dir_id = filedir.dir_id
group by dirs_r.path
order by dirs_r.path
Upvotes: 1