Reputation: 1223
I normally sling regexes like it's a native language, but I'm stumped by this puzzle today. I need to capture all the text of a string except for the final hashtag. Any hashtags except for the final one should be included, and it also needs to match if there are no hashtags at all.
Test Case 1:
Foo bar #baz
Foo bar
Test Case 2:
Foo bar #baz #qux
Foo bar #baz
Test Case 3:
Foo bar
Foo bar
Because of the environment I'm using this in (Zapier), I have a tight constraint that I need the matching string in a single capturing group with the same group number regardless of the case. Zapier uses the Python engine, FWIW.
The context is posting photos from Instagram automatically to Twitter, but needing to limit the length to 280 characters. Since Zapier's truncate function doesn't allow cutting on clean word boundaries, there's the chance that 280 characters could run out in the middle of a hashtag, potentially leading to an embarassing result when Twitter auto-links it. (Zapier's truncate does allow appending an ellipsis, which mitigates the issue for regular words.) Since it's not critical to include every hashtag, I want to throw away the final one, in case it's been truncated.
Upvotes: 1
Views: 236
Reputation:
You can use an unrolled loop method.
This is probably the fastest way to do it.
[^#]*(?:\#(?![^#]*$)[^#]*)*
see https://regex101.com/r/vlEows/1/tests
Upvotes: 1
Reputation: 110685
You could match the following regular expression, which conditions on whether the string ends with a hashtag.
^(?:(?=.*#\w+$).*(?=#\w+$)|.*)
If you need a capture group, use $0
, which holds the complete match.
The regex engine performs the following operations.
^ : match beginning of string
(?: : begin non-capture group
(?=.*#\w+$) : positive lookahead asserts that the string
ends with a hashtag
.* : match 0+ characters
(?=#\w+$) : positive lookahead asserts that the next character
begins a hashtag at the end of the string
| : or
.* : match 0+ characters
) : end non-capture group
One could alternatively remove the non-capture group and repeat the beginning-of-string anchor:
^(?=.*#\w+$).*(?=#\w+$)|^.*
Upvotes: 1
Reputation: 1223
Just about as soon as I finished typing this out, I found my own solution (yay, rubber-ducking 🐤 it). Figured I'd post it for anybody else needing this specific strange solution:
((^[^#]+$)|(?:.|\n)+)(?(2)|\s#[^#]+)
Test results: https://regex101.com/r/RNGVSL/2/tests
Update
Simpler answer from Wiktor Stribiżew in comments:
(?s)^(.*?)(?:\s*#[^\s#]+)?$
Test results: https://regex101.com/r/RNGVSL/3/tests
Upvotes: 1