Randhir Singh
Randhir Singh

Reputation: 41

Regex return specified number of characters before and after match

I have a requirement to extract the number of characters before and after REGEX match. For example:
Input : ABCDEFGHIJK//MNOPQRST Output : IJK//MNOPQ

Input : zzzABCDEFGHIJK//MNOPQRST Output :

I want only first 3 characters before "//" and 5 characters after "//". Also exclude line that starts with zzz.

The code currently I am using to search // :

^(?!.*zzz)?=.{0,3}//.{0,100}[a-zA-Z0-9])(?=\S+$).{2,5000} --- Not working
(?=.{0,3}//.{0,100}[a-zA-Z0-9])(?=\S+$).{2,5000} --- Working

https://regex101.com/r/ry6Y09/1 --- Regex demo

I need to specify limit.

Upvotes: 4

Views: 4569

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626853

To get three chars before // and five chars after //, you can use

.{0,3}//.{0,5}
.{3}//.{5}

See the regex demo #1 and regex demo #2.

Mind that .{0,3}//.{0,5} is good to use when you expect matches that have fewer chars before // and after //, just because they are close to the start / end of string.

The .{3}//.{5} regex will not match in a ab//abcde string, for example, as it will require exactly three and five chars before/after //.

Depending on how you declare the regex, you might need to escape /.

More details:

  • .{0,3} - zero to three chars other than line break chars
  • .{3} - thre chars other than line break chars
  • // - a // string
  • .{5} - five chars other than line break chars
  • .{0,5} - zero to five chars other than line break chars

Now, answering your edit and comment, if you want to extract a .{3}//.{5} substring from a string that does not start with zzz and contains 2 to 5000 non-whitespace only chars you can use

^(?!zzz)(?=\S{2,5000}$).*(.{3}//.{0,100})(?!\w)
^(?!zzz)(?=\S{2,5000}$).*?(.{3}//.{0,100})(?!\w)

Grab Group 1. See the regex demo. Details:

  • ^ - start of string
  • (?!zzz) - no zzz allowed at the start of a string
  • (?=\S{2,5000}$) - the string must only consist of two to 5000 non-whitespace chars
  • .*? - match/consume any zero or more chars other than line break chars, as few as possible (.* consumes as many as possible)
  • (.{3}//.{0,100}) - any 3 chars other than line break chars, //, and any 0 to 100 chars other than line break chars
  • (?!\w) - not followed with a word char. Remove if this check is not required.

Upvotes: 3

Related Questions