JoelFan
JoelFan

Reputation: 38714

How can I match at the beginning of any line, including the first, with a Perl regex?

According the Perl documentation on regexes:

By default, the "^" character is guaranteed to match only the beginning of the string ... Embedded newlines will not be matched by "^" ... You may, however, wish to treat a string as a multi-line buffer, such that the "^" will match after any newline within the string ... you can do this by using the /m modifier on the pattern match operator.

The "after any newline" part means that it will only match at the beginning of the 2nd and subsequent lines. What if I want to match at the beginning of any line (1st, 2nd, etc.)?

EDIT: OK, it seems that the file has BOM information (3 chars) at the beginning and that's what's messing me up. Any way to get ^ to match anyway?

EDIT: So in the end it works (as long as there's no BOM), but now it seems that the Perl documentation is wrong, since it says "after any newline"

Upvotes: 6

Views: 2854

Answers (4)

Andrew_1510
Andrew_1510

Reputation: 13526

Put a empty line at the beginning of the file, this cool things down, and avoid to make regex hard to read.

Yes, the BOM. It might appear at the beginning of the file, so put an empty at the beginning of the file. The BOM will not be \s, or something can be seen by bare eye. It kills my hours when a BOM make my regex fail.

Upvotes: -1

Eugene Yarmash
Eugene Yarmash

Reputation: 149813

You can use the /^(?:\xEF\xBB\xBF)?/mg regex to match at the beginning of the line anyway, if you want to preserve the BOM.

Upvotes: 3

kennytm
kennytm

Reputation: 523334

The ^ does match the 1st line with the /m flag:

~:1932$ perl -e '$a="12\n23\n34";$a=~s/^/:/gm;print $a'
:12
:23
:34

To match with BOM you need to include it in the match.

~:1939$ perl -e '$a="12\n23\n34";$a=~s/^(\d)/<\1>:/mg;print $a'
12
<2>:3
<3>:4
~:1940$ perl -e '$a="12\n23\n34";$a=~s/^(?:)?(\d)/<\1>:/mg;print $a'
<1>:2
<2>:3
<3>:4

Upvotes: 5

Jonathan Leffler
Jonathan Leffler

Reputation: 754010

Conceptually, there's assumed to be a newline before the beginning of the string. Consequently, /^a/ will find a letter 'a' at the beginning of a string.

Upvotes: 1

Related Questions