user271077
user271077

Reputation: 1006

Replacing mutiple strings recursively within all files in a directory using Perl

I'm new with perl. saw many samples but had problems composing a solution I have a list of strings which each string should be replaced in a different string a->a2, b->b34, etc. list of replacement is in some csv file. need to perform this replacement recursively on all files in directory. might be any other language just thought perl would be the quickest

Upvotes: 1

Views: 1702

Answers (1)

amon
amon

Reputation: 57590

Your problem can be split into three steps:

  1. Getting the search-and-replace strings from the CSV file,
  2. Getting a list of all text files inside a given directory incl. subdirectories, and
  3. Replacing all occurences of the search strings with their replacements.

So lets do a countdown and see how we can do that :)

#!/usr/bin/perl
use strict; use warnings;

3. Search and replace

We will define a sub searchAndReplace. It takes a file name as argument and accesses an outside hash. We will call this hash %replacements. Each key is a string we want to replace, and the value is the replacement. This "imposes" the restriction that there can only be one replacement per search string, but that should seem natural. I will further assume that each file is reasonably small (i.e. fits into RAM).

sub searchAndReplace {
  my ($filename) = @_;
  my $content = do {
    open my $file, "<", $filename or die "Cant open $filename: $!";
    local $/ = undef; # set slurp mode
    <$file>;
  };
  while(my ($string, $replacement) = each %replacements) {
    $content =~ s/\Q$string\E/$replacement/g;
  }
  open my $file, ">", $filename or die "Can't open $filename: $!";
  print $file $content; # I didn't forget the comma
  close $file;
}

This code is pretty straightforward, I escape the $string inside the regex so that the contents aren't treated as a pattern. This implementation has the side effect of possibly replacing part of the $content string where something already was replaced, but one could work around that if this is absolutely neccessary.

2. Traversing the file tree

We will define a sub called anakinFileWalker. It takes a filename or a name of an directory and the searchAndReplace sub as arguments. If the filename argument is a plain file, it does the searchAndReplace, if it is a directory, it opens the directory and calls itself on each entry.

sub anakinFileWalker {
  my ($filename, $action) = @_;
  if (-d $filename) {
    opendir my $dir, $filename or die "Can't open $filename: $!";
    while (defined(my $entry = readdir $dir)) {
      next if $entry eq '.' or $entry eq '..';
      # come to the dark side of recursion
      anakinFileWalker("$filename/$entry", $action); # be sure to give full path
    }
  } else {
    # Houston, we have a plain file:
    $action->($filename);
  }
}

Of course, this sub blows up if you have looping symlinks.

1. Setting up the %replacements

There is a nice module Text::CSV which will help you with all your needs. Just make sure that the %replacements meet the definition above, but that isn't hard.

Starting it all

When the %replacements are ready, we just do

anakinFileWalker($topDirectory, \&searchAndReplace);

and it should work. If not, this should have given you an idea about how to solve such a problem.

Upvotes: 3

Related Questions