Reputation: 53498
I'm doing a disk space report that uses File::Find
to collect cumulative sizing in a directory tree.
What I get (easily) from File::Find
is the directory name.
e.g.:
/path/to/user/username/subdir/anothersubdir/etc
I'm running File::Find
to collect sizes beneath:
/path/to/user/username
And build a cumulative size report of the directory and each of the subdirectories.
What I've currently got is:
while ( $dir_tree ) {
%results{$dir_tree} += $blocks * $block_size;
my @path_arr = split ( "/", $dir_tree );
pop ( @path_arr );
$dir_tree = join ( "/", @path_arr );
}
(And yes, I know that's not very nice.).
The purpose of doing this is so when I stat
each file, I add it's size to the current node and each parent node in the tree.
This is sufficient to generate:
username,300M
username/documents,150M
username/documents/excel,50M
username/documents/word,40M
username/work,70M
username/fish,50M,
username/some_other_stuff,30M
But I'd like to now turn that in to JSON more like this:
{
"name" : "username",
"size" : "307200",
"children" : [
{
"name" : "documents",
"size" : "153750",
"children" : [
{
"name" : "excel",
"size" : "51200"
},
{
"name" : "word",
"size" : "81920"
}
]
}
]
}
That's because I'm intending to do a D3 visualisation of this structure - loosely based on D3 Zoomable Circle Pack
So my question is this - what is the neatest way to collate my data such that I can have cumulative (and ideally non cumulative) sizing information, but populating a hash hierarchically.
I was thinking in terms of a 'cursor' approach (and using File::Spec
this time):
use File::Spec;
my $data;
my $cursor = \$data;
foreach my $element ( File::Spec -> splitdir ( $File::Find::dir ) ) {
$cursor -> {size} += $blocks * $block_size;
$cursor = $cursor -> {$element}
}
Although... that's not quite creating the data structure I'm looking for, not least because we basically have to search by hash key to do the 'rolling up' part of the process.
Is there a better way of accomplishing this?
Edit - more complete example of what I have already:
#!/usr/bin/env perl
use strict;
use warnings;
use File::Find;
use Data::Dumper;
my $block_size = 1024;
sub collate_sizes {
my ( $results_ref, $starting_path ) = @_;
$starting_path =~ s,/\w+$,/,;
if ( -f $File::Find::name ) {
print "$File::Find::name isafile\n";
my ($dev, $ino, $mode, $nlink, $uid,
$gid, $rdev, $size, $atime, $mtime,
$ctime, $blksize, $blocks
) = stat($File::Find::name);
my $dir_tree = $File::Find::dir;
$dir_tree =~ s|^$starting_path||g;
while ($dir_tree) {
print "Updating $dir_tree\n";
$$results_ref{$dir_tree} += $blocks * $block_size;
my @path_arr = split( "/", $dir_tree );
pop(@path_arr);
$dir_tree = join( "/", @path_arr );
}
}
}
my @users = qw ( user1 user2 );
foreach my $user (@users) {
my $path = "/home/$user";
print $path;
my %results;
File::Find::find(
{ wanted => sub { \&collate_sizes( \%results, $path ) },
no_chdir => 1
},
$path
);
print Dumper \%results;
#would print this to a file in the homedir - to STDOUT for convenience
foreach my $key ( sort { $results{$b} <=> $results{$a} } keys %results ) {
print "$key => $results{$key}\n";
}
}
And yes - I know this isn't portable, and does a few somewhat nasty things. Part of what I'm doing here is trying to improve on that. (But currently it's a Unix based homedir structure, so that's fine).
Upvotes: 6
Views: 1015
Reputation: 386361
If you do your own dir scanning instead of using File::Find, you naturally get the right structure.
sub _scan {
my ($qfn, $fn) = @_;
my $node = { name => $fn };
lstat($qfn)
or die $!;
my $size = -s _;
my $is_dir = -d _;
if ($is_dir) {
my @child_fns = do {
opendir(my $dh, $qfn)
or die $!;
grep !/^\.\.?\z/, readdir($dh);
};
my @children;
for my $child_fn (@child_fns) {
my $child_node = _scan("$qfn/$child_fn", $child_fn);
$size += $child_node->{size};
push @children, $child_node;
}
$node->{children} = \@children;
}
$node->{size} = $size;
return $node;
}
Rest of the code:
#!/usr/bin/perl
use strict;
use warnings;
no warnings 'recursion';
use File::Basename qw( basename );
use JSON qw( encode_json );
...
sub scan { _scan($_[0], basename($_[0])) }
print(encode_json(scan($ARGV[0] // '.')));
Upvotes: 3
Reputation: 53498
In the end, I have done it like this:
In the File::Find
wanted sub collate_sizes
:
my $cursor = $data;
foreach my $element (
File::Spec->splitdir( $File::Find::dir =~ s/^$starting_path//r ) )
{
$cursor->{$element}->{name} = $element;
$cursor->{$element}->{size} += $blocks * $block_size;
$cursor = $cursor->{$element}->{children} //= {};
}
To generate a hash of nested directory names. (The name
subelement is probably redundant, but whatever).
And then post process it with (using JSON
):
my $json_structure = {
'name' => $user,
'size' => $data->{$user}->{size},
'children' => [],
};
process_data_to_json( $json_structure, $data->{$user}->{children} );
open( my $json_out, '>', "homedir.json" ) or die $!;
print {$json_out} to_json( $json_structure, { pretty => 1 } );
close($json_out);
sub process_data_to_json {
my ( $json_cursor, $data_cursor ) = @_;
if ( ref $data_cursor eq "HASH" ) {
print "Traversing $key\n";
my $newelt = {
'name' => $key,
'size' => $data_cursor->{$key}->{size},
};
push( @{ $json_cursor->{children} }, $newelt );
process_data_to_json( $newelt, $data_cursor->{$key}->{children} );
}
}
Upvotes: 0