Sobrique
Sobrique

Reputation: 53498

File path into JSON data structure

I'm doing a disk space report that uses File::Find to collect cumulative sizing in a directory tree.

What I get (easily) from File::Find is the directory name.

e.g.:

/path/to/user/username/subdir/anothersubdir/etc

I'm running File::Find to collect sizes beneath:

/path/to/user/username

And build a cumulative size report of the directory and each of the subdirectories.

What I've currently got is:

while ( $dir_tree ) {
   %results{$dir_tree} += $blocks * $block_size;
   my @path_arr = split ( "/", $dir_tree ); 
   pop ( @path_arr );
   $dir_tree = join ( "/", @path_arr ); 
}

(And yes, I know that's not very nice.).

The purpose of doing this is so when I stat each file, I add it's size to the current node and each parent node in the tree.

This is sufficient to generate:

username,300M
username/documents,150M
username/documents/excel,50M
username/documents/word,40M
username/work,70M
username/fish,50M,
username/some_other_stuff,30M

But I'd like to now turn that in to JSON more like this:

{ 
    "name" : "username",
    "size" : "307200",
    "children" : [
        { 
            "name" : "documents",
            "size" : "153750",
            "children" : [
                  { 
                      "name" : "excel",
                      "size" : "51200"
                   }, 
                   {
                       "name" : "word",
                       "size" : "81920"
                   }
             ]
         }
    ]
}

That's because I'm intending to do a D3 visualisation of this structure - loosely based on D3 Zoomable Circle Pack

So my question is this - what is the neatest way to collate my data such that I can have cumulative (and ideally non cumulative) sizing information, but populating a hash hierarchically.

I was thinking in terms of a 'cursor' approach (and using File::Spec this time):

use File::Spec; 
my $data;
my $cursor = \$data; 
foreach my $element ( File::Spec -> splitdir ( $File::Find::dir ) ) {
   $cursor -> {size} += $blocks * $block_size;
   $cursor = $cursor -> {$element} 
}

Although... that's not quite creating the data structure I'm looking for, not least because we basically have to search by hash key to do the 'rolling up' part of the process.

Is there a better way of accomplishing this?

Edit - more complete example of what I have already:

#!/usr/bin/env perl

use strict;
use warnings;

use File::Find;
use Data::Dumper;

my $block_size = 1024;

sub collate_sizes {
    my ( $results_ref, $starting_path ) = @_;
    $starting_path =~ s,/\w+$,/,;
    if ( -f $File::Find::name ) {
        print "$File::Find::name isafile\n";
        my ($dev,   $ino,     $mode, $nlink, $uid,
            $gid,   $rdev,    $size, $atime, $mtime,
            $ctime, $blksize, $blocks
        ) = stat($File::Find::name);

        my $dir_tree = $File::Find::dir;
        $dir_tree =~ s|^$starting_path||g;
        while ($dir_tree) {
            print "Updating $dir_tree\n";
            $$results_ref{$dir_tree} += $blocks * $block_size;
            my @path_arr = split( "/", $dir_tree );
            pop(@path_arr);
            $dir_tree = join( "/", @path_arr );
        }
    }
}

my @users = qw ( user1 user2 );

foreach my $user (@users) {
    my $path = "/home/$user";
    print $path;
    my %results;
    File::Find::find(
        {   wanted   => sub { \&collate_sizes( \%results, $path ) },
            no_chdir => 1
        },
        $path
    );
    print Dumper \%results;

    #would print this to a file in the homedir - to STDOUT for convenience
    foreach my $key ( sort { $results{$b} <=> $results{$a} } keys %results ) {
       print "$key => $results{$key}\n";
    }
}

And yes - I know this isn't portable, and does a few somewhat nasty things. Part of what I'm doing here is trying to improve on that. (But currently it's a Unix based homedir structure, so that's fine).

Upvotes: 6

Views: 1015

Answers (2)

ikegami
ikegami

Reputation: 386361

If you do your own dir scanning instead of using File::Find, you naturally get the right structure.

sub _scan {
   my ($qfn, $fn) = @_;
   my $node = { name => $fn };

   lstat($qfn)
      or die $!;

   my $size   = -s _;
   my $is_dir = -d _;

   if ($is_dir) {
      my @child_fns = do {
         opendir(my $dh, $qfn)
            or die $!;

         grep !/^\.\.?\z/, readdir($dh);
      };

      my @children;
      for my $child_fn (@child_fns) {
         my $child_node = _scan("$qfn/$child_fn", $child_fn);
         $size += $child_node->{size};
         push @children, $child_node;
      }

      $node->{children} = \@children;
   }

   $node->{size} = $size;
   return $node;
}

Rest of the code:

#!/usr/bin/perl

use strict;
use warnings;    
no warnings 'recursion';

use File::Basename qw( basename );
use JSON           qw( encode_json );

...    

sub scan { _scan($_[0], basename($_[0])) }

print(encode_json(scan($ARGV[0] // '.')));

Upvotes: 3

Sobrique
Sobrique

Reputation: 53498

In the end, I have done it like this:

In the File::Find wanted sub collate_sizes:

my $cursor = $data;
foreach my $element (
    File::Spec->splitdir( $File::Find::dir =~ s/^$starting_path//r ) )
{
    $cursor->{$element}->{name} = $element;
    $cursor->{$element}->{size} += $blocks * $block_size;
    $cursor = $cursor->{$element}->{children} //= {};
}

To generate a hash of nested directory names. (The name subelement is probably redundant, but whatever).

And then post process it with (using JSON):

my $json_structure = {
    'name'     => $user,
    'size'     => $data->{$user}->{size},
    'children' => [],
};
process_data_to_json( $json_structure, $data->{$user}->{children} );
open( my $json_out, '>', "homedir.json" ) or die $!;
print {$json_out} to_json( $json_structure, { pretty => 1 } );
close($json_out);


sub process_data_to_json {
    my ( $json_cursor, $data_cursor ) = @_;
    if ( ref $data_cursor eq "HASH" ) {
        print "Traversing $key\n";
        my $newelt = {
            'name' => $key,
            'size' => $data_cursor->{$key}->{size},
        };
        push( @{ $json_cursor->{children} }, $newelt );
        process_data_to_json( $newelt, $data_cursor->{$key}->{children} );
    }
}

Upvotes: 0

Related Questions