bright-star
bright-star

Reputation: 6447

What's the most sensible way to emulate lookaround behavior in Rust regex?

The Rust regex crate states:

This crate provides a native implementation of regular expressions that is heavily based on RE2 both in syntax and in implementation. Notably, backreferences and arbitrary lookahead/lookbehind assertions are not provided.

As of this writing, "rust regex lookbehind" comes back with no results from DuckDuckGo.

I've never had to work around this before, but I can think of two approaches:

Approach 1 (forward)

  1. Iterate over .captures() for the pattern I want to use as lookbehind.
  2. Match the thing I actually wanted to match between captures. (forward)

Approach 2 (reverse)

  1. Match the pattern I really want to match.
  2. For each match, look for the lookbehind pattern until the end byte of a previous capture or the beginning of the string.

Not only does this seem like a huge pain, it also seems like a lot of edge cases are going to trip me up. Is there a better way to go about this?

Example

Given a string like:

"Fish33-Tiger2Hyena4-"

I want to extract ["33-", "2", "4-"] iff each one follows a string like "Fish".

Upvotes: 19

Views: 7906

Answers (2)

BurntSushi5
BurntSushi5

Reputation: 15354

Without a motivating example, it's hard to usefully answer your question in a general way. In many cases, you can substitute lookaround operators with two regexes---one to search for candidates and another to produce the actual match you're interested in. However, this approach isn't always feasible.

If you're truly stuck, then you're only option is to use a regex library that supports these features. Rust has bindings to a couple of them:

There is also a more experimental library, fancy-regex, which is built on top of the regex crate.

Upvotes: 17

bright-star
bright-star

Reputation: 6447

If you have a regex application where you have a known consistent pattern that you want to use as lookbehind, another workaround is to use .splits() with the lookbehind-matching pattern as the argument (similar to the idea mentioned in the other answer). That will at least give you strings expressed by their adjacency to the match you want to lookbehind.

I don't know about performance guarantees regex-wise but this at least means that you can do a lookbehind-free regex match on the split result either N times (for N splits), or once on the concatenated result as needed.

Upvotes: 5

Related Questions