Reputation: 23
I have a filename having numerals like test_20200331_2020041612345678.csv.
So I just want to read only first 8 characters from the number between last underscore and .csv using a regex. For e.g: From the file name test_20200331_2020041612345678.csv --> i want to read only 20200416 using regex.
Regex tried: (?<=_)(\d+)(?=\.)
But it is returning the full number between underscore and period i.e 2020041612345678
Also, when tried quantifier like (?<=_)(\d{8})(?=\.)
its not matching with any string
Upvotes: 1
Views: 742
Reputation: 626870
The (?<=_)(\d{8})(?=\.)
does not work because the (?=\.)
positive lookahead requires the presence of a .
char immediately to the right of the current location, i.e. right after the eigth digit, but there are more digits in between.
You may add \d*
before \.
to match any amount of digits after the required 8 digits, use
(?<=_)\d{8}(?=\d*\.)
Or, with a capturing group, you do not even need lookarounds (just make sure you access Group 1 when a match is obtained):
_(\d{8})\d*\.
See the regex demo
import re
s = "test_20200331_2020041612345678.csv"
m = re.search(r"(?<=_)\d{8}(?=\d*\.)", s)
# m = re.search(r"_(\d{8})\d*\.", s) # capturing group approach
if m:
print(m.group()) # => 20200416
# print(m.group(1)) # capturing group approach
Upvotes: 1