sfabel
sfabel

Reputation: 31

perl replacing serialized strings from sql dump

I'm having to replace fqdn's inside a SQL dump for website migration purposes. I've written a perl filter that's supposed to take STDIN, replace the serialized strings containing the domain name that's supposed to be replaced, replace it with whatever argument is passed into the script, and output to STDOUT.

This is what I have so far:

my $search   = $ARGV[0];
my $replace  = $ARGV[1];
my $offset_s = length($search);
my $offset_r = length($replace);
my $regex    = eval { "s\:([0-9]+)\:\\\"(https?\://.*)($search.*)\\\"" };

while (<STDIN>) {
    my @fs = split(';', $_);
    foreach (@fs) {
        chomp;
        if (m#$regex#g) {
        my ( $len, $extra, $str ) = ( $1, $2, $3 );
        my $new_len = $len - $offset_s + $offset_r;
        $str =~ eval { s/$search/$replace/ };
        print 's:' . $new_len . ':' . $extra . $str . '\"'."\n";
        }
    }
}

The filter gets passed data that may look like this (this is taken from a wordpress dump, but we're also supposed to accommodate drupal dumps:

INSERT INTO `wp_2_options` VALUES (1,'siteurl','http://to.be.replaced.com/wordpress/','yes'),(125,'dashboard_widget_options','
a:2:{
s:25:\"dashboard_recent_comments\";a:1:{
s:5:\"items\";i:5;
}
s:24:\"dashboard_incoming_links\";a:2:{
s:4:\"home\";s:31:\"http://to.be.replaced.com/wordpress\";
s:4:\"link\";s:107:\"http://blogsearch.google.com/blogsearch?scoring=d&partner=wordpress&q=link:http://to.be.replaced.com/wordpress/\";
}
}
','yes'),(148,'theme_175','
a:1:{
s:13:\"courses_image\";s:37:\"http://to.be.replaced.com/files/image.png\";
}
','yes')

The regex works if I don't have any periods in my $search. I've tried escaping the periods, i.e. domain\.to\.be\.replaced, but that didn't work. I'm probably doing this either in a very roundabout way or missing something obvious. Any help would be greatly appreciated.

Upvotes: 1

Views: 310

Answers (3)

regilero
regilero

Reputation: 30496

I'm not sure your perl regex would replace the DNS in string matching several times the old DNS (in the same serialized string).

I made a gist with a script using bash, sed and one big perl regex for this same problem. You may give it a try.

The regex I use is something like that (exploded for lisibility, and having -7 as the known difference between domain names lengths):

perl -n -p -i -e '1 while s#
  ([;|{]s:)
  ([0-9]+)
  :\\"
  (((?!\\";).)*?)
  (domain\.to\.be\.replaced)
  (.*?)
  \\";#"$1".($2-7).":\\\"$3new.domain.tld$6\\\";"#ge;' file

Which is maybe not the best one but at least it seems to de the job. The g option manages lines containing several serialized strings to cleanup and the while loop redo the whole job until no replacement occurs in serilized strings (for strings containing several occurences of the DNS). I'm not fan enough of regex to try a recursive one.

Upvotes: 0

collapsar
collapsar

Reputation: 17238

you have to double the escape char \ in your $search variable for the interpolated string to contain the escaped periods.

i.e. domain\.to\.be\.replaced -> domain.to.be.replaced (not wanted)

while domain\\.to\\.be\\.replaced -> domain\.to\.be\.replaced (correct).

Upvotes: 1

Birei
Birei

Reputation: 36272

There is no need to evaluate (eval) your regular expression because of including variables in them. Also, to avoid the special meaning of metacharacters of those variables like $search, escape them using quotemeta() function or including the variable between \Q and \E inside the regexp. So instead of:

my $regex = eval { "s\:([0-9]+)\:\\\"(https?\://.*)($search.*)\\\"" };

Use:

my $regex = qr{s\:([0-9]+)\:\\\"(https?\://.*)(\Q$search\E.*)\\\"};

or

my $quoted_search = quotemeta $search;
my $regex = qr{s\:([0-9]+)\:\\\"(https?\://.*)($quoted_search.*)\\\"};

And the same advice for this line:

$str =~ eval { s/$search/$replace/ };

Upvotes: 2

Related Questions