Reputation: 609
I don't know enough about Perl to even know what I'm asking for exactly, but I'm writing a series of subroutines to be available for many individual scripts that all process different incoming flat files. The process is far from perfect, but it's what I've got to deal with and I'm trying to build myself a small library of subs that make it easier for me to manage it all. Each script handles a different incoming flat file with it's own formatting, sorting, grouping and outputting requirements. One common aspect is that we have small text files that house counters that are used to name the output files so that we have no duplicate file names.
Because the processing of the data is different for each file, I need to open the file to get my counter value, because this is a common operation, I'd like to put it in a sub to retrieve the counter. But then need to write specific code to process the data. And would like a second sub that allows me to update the counter with the counter once I've processed the data.
Is there a way to make the second sub call a requirement if the first one is called? Ideally if it could even be an error that would prevent the script from running at all much like a syntax error.
EDIT: Here is a little [ugly and simplified] psuedo-code to give a better feel for what the current process is:
require "importLibrary.plx";
#open data source file
open DataIn, $filename;
# call getCounterInfo from importLibrary.plx to get
# the counter value from counter file
$counter = &getCounterInfo($counterFileName);
while (<DataIn>) {
# Process data based on unique formatting and requirements
# output to task files based on requirements and name files
# using the $counter increment $counter
}
#update counter file with new value of $counter
&updateCounterInfo($counter);
Upvotes: 2
Views: 233
Reputation: 57590
I don't quite get what you are trying, but you can always make your subs pluggable:
We have a sub process_file
. It takes a subroutine as argument that will do the main processing:
our $counter;
sub process_file {
my ($subroutine, @args) = @_;
local $counter = get_counter();
my @return_value = $subroutine->(@args);
set_counter($counter);
return @return_value;
}
# Here are other sub definitions for the main processing
# They can see $counter and always magically have the right value.
# If they assign to it, the counter file will be updated afterwards.
Assuming we have a sub process_type_A
, we can then do
my @return_values = process_file(\&process_type_A, $arg1, $arg2, $arg3);
This behaves just like process_type_A($arg1, $arg2, $arg3)
, except for the extra call stack frame and the $counter
setting.
If you prefer passing names instead of coderefs, we can arrange for that too.
package MitchelWB::FileParsingLib;
our $counter;
our %file_type_processing_hash = (
"typeA" => \&process_type_A,
"typeB" => \&process_type_B,
"countLines" => sub { # anonymous sub
open my $fh, '<', "./dir/$counter.txt" or die "cant open $counter file";
my $lines = 0;
$lines++ while <$fh>;
return $lines;
},
);
sub process_file {
my ($filetype, @args) = @_;
local $counter = get_counter();
# fetch appropriate subroutine:
my $subroutine = $file_type_processing_hash{$filetype};
die "$filetype is not registered" if not defined $subroutine; # check for existence
die "$filetype is not assigned to a sub" if ref $subroutine ne 'CODE'; # check that we have a sub
# execute
my @return_value = $subroutine->(@args);
set_counter($counter);
return @return_value;
}
...;
my $num_of_lines = process_file('countLines');
Why stupid callbacks? Why extra code? Why calling conventions? Why dispatch tables? While they all are very interesting and flexible, there is a more elegant solution. I had just forgotten a tiny little piece of information, but now it has all fallen into place. Perl has "Attributes", known as "Annotations" in other languages, that allow us to, well, annotate code or variables.
Defining a new Perl attribute is easy. We use Attribute::Handlers
and define a sub with the same name as the attribute you want to use:
sub file_processor :ATTR(CODE) {
my (undef, $glob, $subroutine) = @_;
no strict 'refs';
${$glob} = sub {
local $counter = get_counter();
my @return_value = $subroutine->(@_);
set_counter($counter);
return @return_value;
}
We use the attribute :ATTR(CODE)
to denote that this is a attribute applicable for subroutines. We only need two arguments, the full name of the subroutine we want to annotate, and a coderef to the sub.
We then turn off a part of the strictness to redefine the sub with ${$glob}
. This is a bit advanced, but it essentially just accesses the internal symbol table.
We replace the annotated sub with a dumbed-down version of process_file
as given above. We can pass all arguments (@_
) right through without further processing.
After all that, we add a tiny litte piece of information to the subs you used before:
sub process_type_A :file_processor {
print "I can haz $counter\n";
}
… and it just does the replacement without further modifications. The changes are invisible when using the library. I am aware of the restrictions of this approach, but you are unlikely to run into them when writing ordinary code.
Upvotes: 4
Reputation: 22893
Well, you could set a global flag and use an END block.
Perhaps neater is something like @amon's proposal or even just putting your file processing in a standard named sub and calling it from your counter code.
my ($fh, counter) = get_counter(...);
my $ok = process_file($fh, $counter);
update_counter($counter) if $ok;
Your process_file will be exported from a module, and if you want to keep it really simple, use perl's -Μ to load the module with you process_file sub.
Upvotes: 0