The D Merged
The D Merged

Reputation: 680

Perl regexp tr// "I dont get why it does this?"

I did the following to my string $text

$text =~ tr/a-zåàâäæçéèêëîïôöœßùûüÿA-ZÅÀÂÄÆÇÉÈÊËÎÏÔÖŒÙÛÜŸ'()\-,.?!:;/\n/cs;

What this did was to split the string in newlines. This is what I wanted to do

but I dont get why it does this?

I thought that this line would take all chars a-zåàâäæçéèêëîïôöœßùûüÿA-ZÅÀÂÄÆÇÉÈÊËÎÏÔÖŒÙÛÜŸ'()-,.?!:; and replace each of them with \n

I dont get what cs in the end does either. Here you can get an explanation of cs but I dont understand what it means:

"c - is used to specify that the SEARCHLIST character set is complemented"

"s - is used to specify that the sequences of characters that were transliterated to the same character are squashed down to a single instance of the character"

Example:

$text= "a ar? å ..";

gives

a
ar?
å
..

Upvotes: 0

Views: 128

Answers (3)

ThisSuitIsBlackNot
ThisSuitIsBlackNot

Reputation: 24073

"c - is used to specify that the SEARCHLIST character set is complemented"

This means that it will replace anything not in the search list with \n. In your example, the only character not in the search list is a space. Therefore each space gets replaced with a newline. As Schwern pointed out, this is not a good way to do this.

"s - is used to specify that the sequences of characters that were transliterated to the same character are squashed down to a single instance of the character"

This means that if three characters in a row are translated (resulting in three \n in a row), the three \n will be "squashed" into a single \n. If you added some spaces to your example input, you could see this in action:

# Multiple spaces separating words
my $str = "a   ar?  å";

Without squashing:

$str =~ tr/a-zåàâäæçéèêëîïôöœßùûüÿA-ZÅÀÂÄÆÇÉÈÊËÎÏÔÖŒÙÛÜŸ'()\-,.?!:;/\n/c;

Outputs:

a



ar?


å

With squashing:

$str =~ tr/a-zåàâäæçéèêëîïôöœßùûüÿA-ZÅÀÂÄÆÇÉÈÊËÎÏÔÖŒÙÛÜŸ'()\-,.?!:;/\n/cs;

Outputs:

a
ar?
å

Upvotes: 3

Schwern
Schwern

Reputation: 165248

If you want to turn all spaces into newlines, listing out all the things which are not spaces is cumbersome and you're likely to forget some. You can instead work directly on the spaces with a regex.

s{\s+}{\n}g;

s{...}{...} is a "search and replace" using regular expressions rather than just characters. \s is regex speak for "whitespace" which includes spaces, tabs and newlines. + says to match 1 or more of them, so multiple spaces in a row will be turned into one newline. The g modifier says to do it "globally" or across every character in the string, otherwise it would stop at the first match.

foo bar     baz

Becomes

foo
bar
baz

Upvotes: 4

Andrew Clark
Andrew Clark

Reputation: 208565

c - is used to specify that the SEARCHLIST character set is complemented

In this usage, "complemented" is similar to "negated" or "reversed", so instead of replacing the characters listed in your expression every character not found in your expression is replaced. In your example string this means that all of the spaces are replaced with a newline because every other character is included in the set.

Upvotes: 5

Related Questions