Nitin
Nitin

Reputation: 476

Regex to extract first 5 digit+character from last hyphen

I am trying to extract first 5 character+digit from last hyphen. Here is the example

  1. String -- X008-TGa19-ER751QF7

Output -- X008-TGa19-ER751

  1. String -- X002-KF13-ER782cPU80

Output -- X002-KF13-ER782

My attempt -- I could manage to take element from the last -- (\w+)[^-.]*$

But now how to take first 5, then return my the entire value as the output as shown in the example.

Upvotes: 0

Views: 1211

Answers (4)

The fourth bird
The fourth bird

Reputation: 163577

You can optionally repeat a - and 1+ word chars from the start of the string. Then match the last - and match 5 word chars.

^\w+(?:-\w+)*-\w{5}
  • ^ Start of string
  • \w+ Math 1+ word chars
  • (?:-\w+)* Optionally repeat - and 1+ word chars
  • -\w{5} Match - and 5 word chars

Regex demo

import re

regex = r"^\w+(?:-\w+)*-\w{5}"
s = ("X008-TGa19-ER751QF7\n"
    "X002-KF13-ER782cPU80")
    
print(re.findall(regex, s, re.MULTILINE))

Output

['X008-TGa19-ER751', 'X002-KF13-ER782']

Note that \w can also match _.

If there can also be other character in the string, to get the first 5 digits or characters except _ after the last hyphen, you can match word characters without an underscore using a negated character class [^\W_]{5}

Repeat that 5 times while asserting no more underscore at the right.

^.*-[^\W_]{5}(?=[^-]*$)

Regex demo

Upvotes: 1

Adrian Shum
Adrian Shum

Reputation: 40066

^(.*-[^-]{5})[^-]*$ Capture group 1 is what you need

https://regex101.com/r/SYz9i5/1

Explanation

^(.*-[^-]{5})[^-]*$
^                    Start of line
 (                   Capture group 1 start
  .*                 Any number of any character
    -                hyphen
     [^-]{5}         5 non-hyphen character
            )        Capture group 1 end 
             [^-]*   Any number of non-hyphen character
                  $  End of line

Another simpler one is

^(.*-.{5}).*$

This should be quite straight-forward.

This is making use of behaviour greedy match of first .*, which will try to match as much as possible, so the - will be the last one with at least 5 character following it. https://regex101.com/r/CFqgeF/1/

Upvotes: 1

Moinuddin Quadri
Moinuddin Quadri

Reputation: 48110

If you are open for non-regex solution, you can use this which is based on splitting, slicing and joining the strings:

>>> my_str = "X008-TGa19-ER751QF7"

>>> '-'.join(s[:5] for s in my_str.split('-'))
'X008-TGa19-ER751'

Here I am splitting the string based on hyphen -, slicing the string to get at max five chars per sub-string, and joining it back using str.join() to get the string in your desired format.

Upvotes: 1

Terry Spotts
Terry Spotts

Reputation: 4075

(\w+-\w+-\w{5}) seems to capture what you're asking for.
Example: https://regex101.com/r/PcPSim/1

Upvotes: 1

Related Questions