Regex to extract date and specific string

Question

I have a a filename below and I want to extract year and _TEXT part.

fle_2019-11-17A17-21-09.01(_TEXT).txt

I am able to do this using two regex and then join the results.

(?<=\_)(\d{4})(?=\-) This gives me year

(?<=$)(.*)(?=$) This gives me _TEXT

Is there a way to get this from a single expression?

The fourth bird · Accepted Answer

One option is to use 2 capturing groups. Depending on what you would allow to match before the first underscore, you could for example use a character class to match word characters without an underscore [^\W_]+

^[^\W_]+_(\d{4})-[\w.-]+$([^)]+)$\.\w+$

In parts

^ Start of string
[^\W_]+ Match 1+ word chars except _
_ Match the _
(\d{4}) Capture group 1, match 1+ digits
-[\w.-]+ Match - and 1+ word chars, . or - (extend the character class with what you would allow to match
$ Match (
- ([^)]+) Capture group 2, match 1+ times any char except )
$ Match )
\.\w+ Match a . and 1+ word chars
$ End of string

Regex demo | Python demo

For example

import re

regex = r"^[^\W_]+_(\d{4})-[\w.-]+$([^)]+)$\.\w+$"
test_str = "fle_2019-11-17A17-21-09.01(_TEXT).txt"
print(re.findall(regex, test_str))

Output

[('2019', '_TEXT')]

Regex to extract date and specific string

Answers (2)

Related Questions