mickeyj
mickeyj

Reputation: 101

Substitution on string in perl changes string to an integer value

I am trying to do delete some characters matching a regex in perl and when I do that it returns an integer value.

I have tried substituting multiple spaces in a string with empty string or basically deleting the space.

#! /usr/intel/bin/perl

my $line = "foo/\\bar car"; 
print "$line\n";
#$line = ~s/(\\|(\s)+)+//; <--Ultimately need this, where backslash and space needs to be deleted. Tried this, returns integer value
$line = ~s/\s+//; <-- tried this, returns integer value
print "$line\n"; 

Expected results:
First print: foo/\bar car
Second print: foo/barcar

Actual result:
First print: foo/\\bar car
Second print: 18913234908

Upvotes: 0

Views: 181

Answers (2)

Dave Cross
Dave Cross

Reputation: 69264

You've already accepted an answer, but I thought it would be useful to give you a few more details.

As you now know,

$line = ~s/\s+//;

is completely different to:

$line =~ s/\s+//;

You wanted the second, but you typed the first. So what did you end up with?

~ is "bitwise negation operator". That is, it converts its argument to a binary number and then bit-flips that number - all the zeroes become ones and all the ones become zeros.

So you're asking for the bitwise negation of s/\s+//. Which means the bitwise negation works on the value returned by s/\s+//. And the value returned by a substitution is the number of substitutions made.

We can now work out all of the details.

  • s/\s+// carries out your substitution and returns the number of substitutions made (an integer).
  • ~s/\s+// returns the bitwise negation of the integer returned by the substitution (which is also an integer).
  • $line = ~s/\s+// takes that second integer and assigns it to the variable $line.

Probably, the first step returns 1 (you don't use /g on your s/.../.../, so only one substitution will be made). It's easy enough to get the bitwise negation of 1.

$ perl -E'say ~1'
18446744073709551614

So that might well be the integer that you're seeing (although it might be different on a 32-bit system).

Upvotes: 0

melpomene
melpomene

Reputation: 85767

The proper solution is

$line =~ s/[\s\\]+//g;

Note:

  • g flag to substitute all occurrences
  • no space between = and ~

=~ is a single operator, binding the substitution operator s to the target variable $line.

Inserting a space (as in your code) means s binds to the default target, $_, because there is no explicit target, and then the return value (which is the number of substitutions made) has all its bits inverted (unary ~ is bitwise complement) and is assigned to $line.

In other words,

$line = ~ s/...//

parses as

$line = ~(s/...//)

which is equivalent to

$line = ~($_ =~ s/...//)

If you had enabled use warnings, you would've gotten the following message:

Use of uninitialized value $_ in substitution (s///) at prog.pl line 6.

Upvotes: 3

Related Questions