Reputation: 23
I'm looking to match and return just the first character from each line in a plain-text UTF-8 encoded file using in a UNIX terminal using egrep. I presumed that the following egrep command with a simple RegEx would produce the desired result:
egrep -o "^." FILE.txt
However, the output appears to be matching and returning every character in the file; that is, it is behaving as if the command were:
egrep -o "." FILE.txt
Similar results occur with the following command,
egrep -o "^[a-z]" FILE.txt
namely, the results act as if the RegEx "[a-z]" were supplied (i.e., every lowercase ASCII character in the range a-z is matched).
Commands in which just one specific alphanumeric characters ist supplied seem, as expected, to return every line that begins with the specific character, e.g.,
egrep -o "^1" FILE.txt
or egrep -o "^T" FILE.txt
return all lines beginning with "1" or "T", respectively.
I have tried pasting the entirety of the file into a RegEx tester, such as at https://regexr.com/, and the expression "^." indeed behaves as expected, so I don't think that my file has any further whitespace characters that could be interfering.
Is there some other behavior of the line-beginning metacharacter "^" with egrep that could be causing this problem?
Upvotes: 2
Views: 3884
Reputation: 18687
This is a known bug in BSD grep
and GNU grep
2.5.1-FreeBSD (also discussed here).
In -o
mode, ^
anchor isn't handled properly (reported here, patched here):
$ echo abc | bsdgrep -o "^."
a
b
c
GNU grep
on Linux behaves as expected:
$ echo abc | grep -o "^."
a
Related to what you are trying to achieve here (print the first character of every line), grep
is an overkill. A simple cut
would suffice:
$ echo abc | cut -c1
a
Upvotes: 1