Match two strings based on common substring

Question

I have a list of files that needs to be grouped in pairs. (I need to append an HTML 'File B' (body) to 'File A' (header) because I need to serve them statically without server-side includes).

Example:

/path/to/headers/.../matching_folder/FileA.html
/someother/path/to/.../matching_folder/body/FileB.html

Emphasizing with the ellipses that the paths are not of uniform length, nor is 'matching folder' always in the same position in the path.

It seems I need to match/join based on the common substring 'matching_folder', but I am stumped on scanning each string, storing, matching (excerpt):

my @dirs = ( $headerPath, $bodyPath );

my @files = ();

find( { wanted => \&wanted, no_chdir => 1 }, @dirs );

foreach my $file (@files) {
# pseudocode: append $file[0] to $file[1] if both paths contain same 'matching_folder'
};

sub wanted {
return unless -f and /(FileA\.html$)|(FileB\.html$)/i;
push @files, $_;
};

choroba · Accepted Answer

Hash the files by all the directory steps in their names.

#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

use File::Find;

my $headerPath = 'headers';
my $bodyPath   = 'bodies';

my @dirs = ($headerPath, $bodyPath);
my @files;

sub wanted {
    return unless -f and /file.\.html$/;
    push @files, $_;
};

find({ wanted => \&wanted, no_chdir => 1 }, @dirs);

my %common;    
for my $file (@files) {
    my @steps = split m(/), $file;
    push @{ $common{$_} }, $file for @steps;
};

# All the headers and all the bodies share their prefixes,
# but that's not what we're interested in.
delete @common{qw{ bodies headers }};

for my $step (keys %common) {
    next if 1 == @{ $common{$step} };
    print "$step common for @{ $common{$step} }
";
}

Tested on the following structure:

bodies/3/something/C/something2/fileA.html
bodies/2/junk/B/fileB.html
bodies/1/A/fileC.html
headers/a/B/fileD.html
headers/c/one/A/two/fileE.html
headers/b/garbage/C/fileF.html

Output:

B common for headers/a/B/fileD.html bodies/2/junk/B/fileB.html
C common for headers/b/garbage/C/fileF.html bodies/3/something/C/something2/fileA.html
A common for headers/c/one/A/two/fileE.html bodies/1/A/fileC.html

Match two strings based on common substring

Answers (2)

Related Questions