Tushar Rastogi
Tushar Rastogi

Reputation: 23

Regex to find N characters between underscore and period

I have a filename having numerals like test_20200331_2020041612345678.csv.

So I just want to read only first 8 characters from the number between last underscore and .csv using a regex. For e.g: From the file name test_20200331_2020041612345678.csv --> i want to read only 20200416 using regex.

Regex tried: (?<=_)(\d+)(?=\.)

But it is returning the full number between underscore and period i.e 2020041612345678

Also, when tried quantifier like (?<=_)(\d{8})(?=\.) its not matching with any string

Upvotes: 1

Views: 742

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626870

The (?<=_)(\d{8})(?=\.) does not work because the (?=\.) positive lookahead requires the presence of a . char immediately to the right of the current location, i.e. right after the eigth digit, but there are more digits in between.

You may add \d* before \. to match any amount of digits after the required 8 digits, use

(?<=_)\d{8}(?=\d*\.)

Or, with a capturing group, you do not even need lookarounds (just make sure you access Group 1 when a match is obtained):

_(\d{8})\d*\.

See the regex demo

Python demo:

import re
s = "test_20200331_2020041612345678.csv"
m = re.search(r"(?<=_)\d{8}(?=\d*\.)", s)
# m = re.search(r"_(\d{8})\d*\.", s) # capturing group approach
if m:
    print(m.group())  # => 20200416
    # print(m.group(1))  # capturing group approach 

Upvotes: 1

Related Questions