rsandell
rsandell

Reputation: 23

Matching the First Character on Each Line (UNIX egrep)

I'm looking to match and return just the first character from each line in a plain-text UTF-8 encoded file using in a UNIX terminal using egrep. I presumed that the following egrep command with a simple RegEx would produce the desired result:

egrep -o "^." FILE.txt

However, the output appears to be matching and returning every character in the file; that is, it is behaving as if the command were:

egrep -o "." FILE.txt

Similar results occur with the following command,

egrep -o "^[a-z]" FILE.txt

namely, the results act as if the RegEx "[a-z]" were supplied (i.e., every lowercase ASCII character in the range a-z is matched).

Commands in which just one specific alphanumeric characters ist supplied seem, as expected, to return every line that begins with the specific character, e.g.,

egrep -o "^1" FILE.txt

or egrep -o "^T" FILE.txt

return all lines beginning with "1" or "T", respectively.

I have tried pasting the entirety of the file into a RegEx tester, such as at https://regexr.com/, and the expression "^." indeed behaves as expected, so I don't think that my file has any further whitespace characters that could be interfering.

Is there some other behavior of the line-beginning metacharacter "^" with egrep that could be causing this problem?

Upvotes: 2

Views: 3884

Answers (1)

randomir
randomir

Reputation: 18687

This is a known bug in BSD grep and GNU grep 2.5.1-FreeBSD (also discussed here).

In -o mode, ^ anchor isn't handled properly (reported here, patched here):

$ echo abc | bsdgrep -o "^."
a
b
c

GNU grep on Linux behaves as expected:

$ echo abc | grep -o "^."
a

Related to what you are trying to achieve here (print the first character of every line), grep is an overkill. A simple cut would suffice:

$ echo abc | cut -c1
a

Upvotes: 1

Related Questions