redchair218
redchair218

Reputation: 29

Using FIND & EXEC to execute a filename-required Perl script over multiple files

I have hundreds of CSV files stored on a unix / Linux directory. Their names adhere to the following format: MMYYY_foo.csv. For example,

072019_foo.csv
122018_foo.csv

I'm trying to compile and convert these individually to XML using a Perl script. The command takes on the form ./script.pl MMMMYY_foo, so the following commands would need to be executed in the above example:

./script.pl 072019_foo
./script.pl 122018_foo

Rather than executing the perl script for each file individually within UNIX / LINUX I am trying to loop through the files, passing them to the perl script for compiling. Tediously researching SO among other sources I came down to the following ...

find . -type -f -name '*.csv' -exec perl script.pl $('-printf "%f\n"') {} \;

However this does not work. Rather it outputs multiple ".xml". Undoubtedly the file name (minus paths and extensions) is not being passed to the script correctly as in the code example above. I've tried multiple variations of ...

$(-printf "%f\n"')

And I know therein lies my problem. In many instances I'm just getting multiple ".xml". I feel I'm on the cusp of finding the solution. It's just that I'm not understanding the appropriateness of the command line function beyond -exec. So I'm asking for any help as to whether anyone knows the solution.

Upvotes: 2

Views: 635

Answers (3)

Polar Bear
Polar Bear

Reputation: 6808

OP's sample of find indicates that all and every cvs file in a directory required to be processed.

Assumed not recursion into directory structure is required.

Power of bash shell could be used for this purpose with file extension to be stripped off before passing to script

for f in *.cvs
do
   ./script.pl ${f%.*}
done

If this task will be repeated on regular base the script above can be stored as shell script or other perl wrapper script created

#!/usr/bin/env perl

use strict;
use warnings;

my $re = qr/(\d{6}_foo).cvs/;

for ( glob('./*.cvs') ) {
        system('./script.pl', $1) if /$re/;
}

Natural behavior of find command is recursion into directory structure. OP should indicate if recursion is desirable or not in the post.

Suggestion: familiarize yourself with 3.5.3 Shell Parameter Expansion, How To Use Bash Parameter Substitution Like A Pro

Upvotes: 1

ikegami
ikegami

Reputation: 386501

That command executes a file named -printf "%f\n" before doing anything else, which obviously fails noisily.

I think you were going for something like

find . -type -f -name '*.csv' -printf '%f\0' | xargs -r0 ./script.pl

But that has two problems:

  • You strip out the path, so it doesn't make any sense to do a recursive search (like find does by default). You've confirmed in the comments that you don't need to do a recursive search.
  • That still passes the extension which you want removed.

As such, the following is the solution you seek:

find . -maxdepth 1 -name '*.csv' -printf '%f\0' |
   perl -0lpe's/\.[^.]*\z//' |
   xargs -r0 ./script.pl

or just

perl -0le'print s/\.[^.]*\z//r for @ARGV' -- *.csv |
   xargs -r0 ./script.pl

or just

perl -e'system("./script.pl", s/\.[^.]*\z//r) for @ARGV' -- *.csv

or just

perl -e'system("./script.pl", s/\.[^.]*\z//r) for glob("*.csv")'

The first and last one will handle very long lists of files better than the other two.

Upvotes: 1

Mark Setchell
Mark Setchell

Reputation: 207758

You can get them all done very simply and in parallel with GNU Parallel like this:

parallel --dry-run perl script.pl {.} ::: *csv

Sample Output

perl script.pl 072019_foo
perl script.pl 122018_foo

If that looks correct, back up your files and run it again without the --dry-run to do it for real.

You can add a progress bar with parallel --bar ...

Upvotes: 1

Related Questions