Reputation: 3847
I run git grep "\<blah\>"
regularly on my linux development server, but I just discovered that I am not able to use \<
and \>
on Mac (Mac OS X 10.6.8) (not able to use = it does not find anything). Is the regular expressions syntax different in Mac?
I tried using git grep -E "\<blah\>"
but to no avail! :-(
Upvotes: 10
Views: 3091
Reputation: 1328182
If you do use -P
, make sure to use Git 2.40 (Q1 2023): "grep -P
" learned to use Unicode Character Property to grok character classes when processing \b
and \w
etc.
See commit acabd20 (08 Jan 2023) by Carlo Marcelo Arenas Belón (carenas
).
(Merged by Junio C Hamano -- gitster
-- in commit 557d93a, 27 Jan 2023)
grep
: correctly identify utf-8 characters with\{b,w}
in-P
Signed-off-by: Carlo Marcelo Arenas Belón
Acked-by: Ævar Arnfjörð Bjarmason
When UTF is enabled for a PCRE match, the corresponding flags are added to the
pcre2_compile()
call, butPCRE2_UCP
wasn't included.This prevents extending the meaning of the character classes to include those new valid characters and therefore result in failed matches for expressions that rely on that extention, for ex:
$ git grep -P '\bÆvar'
Add
PCRE2_UCP
so that\w
will includeÆ
and therefore\b
could correctly match the beginning of that word.This has an impact on performance that has been estimated to be between 20% to 40% and that is shown through the added performance test.
That means those patterns will work, with any character:
'\bhow'
'\bÆvar'
'\d+ \bÆvar'
'\bBelón\b'
'\w{12}\b'
With Git 2.41 (Q2 2023), a recent-ish change to allow Unicode character classes to be used with "grep -P
" triggered a JIT bug in older pcre2
libraries.
The problematic change in Git built with these older libraries has been disabled to work around the bug.
See commit 14b9a04 (23 Mar 2023) by Mathias Krause (mathiaskrause
).
(Merged by Junio C Hamano -- gitster
-- in commit d35cd54, 30 Mar 2023)
grep
: work around UTF-8 related JIT bug in PCRE2 <= 10.34Reported-by: Stephane Odul
Signed-off-by: Mathias Krause
Stephane is reporting a regression introduced in Git v2.40.0 that leads to '
git grep
'(man) segfaulting in his CI pipeline.
It turns out, he's using an older version oflibpcre2
that triggers a wild pointer dereference in the generated JIT code that was fixed in PCRE2 10.35.Instead of completely disabling the JIT compiler for the buggy version, just mask out the Unicode property handling as we used to do prior to commit acabd20 (
grep
: correctly identify utf-8 characters with {b, 2023-01-08, Git v2.40.0-rc0 -- merge listed in batch #11) ("grep
: correctly identify utf-8 characters with\{b,w}
in-P
").
Git 2.48 (Q1 2025), batch 7, fixes another issue with 'git grep
'(man): a regression on macOS fixed by disabling lookahead when encountering invalid UTF-8 byte sequences.
See commit ce025ae (20 Oct 2024) by René Scharfe (rscharfe
).
(Merged by Taylor Blau -- ttaylorr
-- in commit 43ac239, 01 Nov 2024)
grep
: disable lookahead on errorReported-by: David Gstir
Signed-off-by: René Scharfe
Tested-by: David Gstir
Signed-off-by: Taylor Blau
regexec(3) can fail.
E.g.
on macOS it fails if it is used with an UTF-8 locale to match a valid regex against a buffer containing invalid UTF-8 characters.
git grep
(man) has two ways to search for matches in a file:
- Either it splits its contents into lines and matches them separately,
- or it matches the whole content and figures out line boundaries later.
The latter is done bylook_ahead()
and it's quicker in the common case where most files don't contain a match.Fall back to line-by-line matching if
look_ahead()
encounters anregexec
(3) error by propagating errors out ofpatmatch()
and bailing out oflook_ahead()
if there is one.
This way we at least can find matches in lines that contain only valid characters.
That matches the behavior ofgrep
(1) on macOS.
pcre2match()
dies ifpcre2_jit_match()
orpcre2_match()
fail, but since we use the flagPCRE2_MATCH_INVALID_UTF
it handles invalid UTF-8 characters gracefully.
So implement the fall-back only forregexec
(3) and leave the PCRE2 matching unchanged.
Upvotes: 0
Reputation: 16195
I guess it's caused by the BSD vs Linux grep library.
See if the -w
(match pattern only at word boundary) option to git grep does it for you:
$ git grep -w blah
Upvotes: 8
Reputation: 33658
After struggling with this, too, I found this very helpful post on a BSD mailing list. So here's the (albeit rather ugly) solution:
git grep "[[:<:]]blah[[:>:]]"
The -w
flag of git-grep also works but sometimes you want to only match the beginning or end of a word.
Update: This has changed in OS X 10.9 "Mavericks". Now you can use \<
, \>
, and \b
. [[:<:]]
and [[:>:]]
are no longer supported.
Upvotes: 11
Reputation: 71
You can compile git with PCRE
support and use git grep -P "\bblah\b"
for word boundaries.
Here's a guide on how to compile git using OSX Homebrew: http://realultimateprogramming.blogspot.com/2012/01/how-to-enable-git-grep-p-on-os-x-using.html
Upvotes: 5