midastown
midastown

Reputation: 13

Python vs Perl and byte count correctness

The output I get from wc when trying to calculate a byte count on a string, differs from python and perl by one byte.

Why is that?

Is this problem exclusive to chars or can this arise in other types?

If so, is there a known offset table for each type?

$ python -c 'print("A")' | wc -c
2
$ python -c 'print("A" * 50)' | wc -c
51

$ perl -e 'print "A"' | wc -c
1
$ perl -e 'print "A" x 50' | wc -c
50

Upvotes: 1

Views: 85

Answers (2)

brian d foy
brian d foy

Reputation: 132832

Perl and Python choose different defaults for the output record separator. You can see the extra newline when you look at the output as octets:

$ python -c 'print("A")' | hexdump
0000000 41 0a
0000002

$ perl -e 'print "A"'  | hexdump
0000000 41
0000001

That's not the only way that Perl is different. Python also adds spaces between arguments to print whereas Perl does not. Ruby's puts adds a newline between arguments:

$ python -c 'print("A", "B")' | hexdump
0000000 41 20 42 0a
0000004

$ perl -e 'print "A", "B"'  | hexdump
0000000 41 42
0000002

$ ruby -e 'puts( "A", "B" )' | hexdump
0000000 41 0a 42 0a
0000004

Perl can add the newline for you. On the command line, the -l switch does that automatically for print (but not printf). Inside the code, say does that, but still not adding any characters between arguments. The -E is like -e but enables new features since v5.10, of which say is one:

$ perl -le 'printf "%s%s", "A", "B"'  | hexdump
0000000 41 42
0000002

$ perl -le 'print "A", "B"'  | hexdump
0000000 41 42 0a
0000003

$ perl -lE 'say "A", "B"'  | hexdump
0000000 41 42 0a
0000003

When you decompile one of these, you can see that Perl is merely setting the output record separator, $\ for you, which you can do yourself using a global variable:

$ perl -MO=Deparse -le 'print "A", "B"'
BEGIN { $/ = "\n"; $\ = "\n"; }
print 'A', 'B';
-e syntax OK

But, you can set the output record separator yourself too:

$ perl -e '$\ = "\n"; print "A", "B"'  | hexdump
0000000 41 42 0a
0000003

Perl controls the characters between arguments to print and say with the $, variable, so you can set that:

$ perl -lE '$, = " "; say "A", "B"'  | hexdump
0000000 41 20 42 0a
0000004

In Python you go in the opposite direction because it has a different defaults. This is for Python 3:

$ python -c 'print("A", "B", sep="", end="")' | hexdump
0000000 41 42
0000002

Upvotes: 1

Steffen Ullrich
Steffen Ullrich

Reputation: 123375

Python print "..." is essentially the same as Perl print "...\n", i.e. Python adds a newline by its own, Perl not (Perl say does though).

Upvotes: 3

Related Questions