Reputation: 43
I'm looking for a regex command to match file names in a folder. I already got all the filenames in a list. Now I want to match a pattern in a loop (file is the string to match):
./test1_word1_1.1_1.2_1.3.csv
with:
match = re.search(r'./{([\w]+)}_word1_{([0-9.]+)}_{([0-9.]+)}_{([0-9.]+)}*',file)
I used to get regex working but in this special case it simple doesn't work. Can you help me with that?
I want to continue with the match of regex the following way (I've written the outcome here):
match[0] = test1
match[1] = 1.1
match[2] = 1.2
match[3] = 1.3
The curly brackets are my fault. They don't make sense at all. Sorry
Best regards, sebastian
Upvotes: 4
Views: 11056
Reputation: 567
Since test_word<>.csv is the file name and content inside <> will always changing and are dot delimited numbers, Can you try this?
r"test1_word[_0-9.]*.csv"g
Sample code and test strings
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"test1_word[_0-9.]*.csv"
test_str = ("./test1_word1_1.1_1.2_1.3.csv\n"
"./test1_word1_1.31.2_1.555.csv\n"
"./test1_word1_10.31.2_2000.00.csv")
matches = re.finditer(regex, test_str)
for matchNum, match in enumerate(matches):
matchNum = matchNum + 1
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
Want to test? https://regex101.com/ will help you.
Upvotes: 1
Reputation: 627607
You may use
r'\./([^\W_]+)_word1_([0-9.]+)_([0-9.]+)_([0-9]+(?:\.[0-9]+)*)'
See the regex demo
Details:
\.
- a literal dot (if it is unescaped it matches any char other than a line break char)/
- a /
symbol (no need escaping it in a Python regex pattern)([^\W_]+)
- Group 1 matching 1 or more letters or digits (if you want to match a chunk containing _
, keep your original (\w+)
pattern)_word1_
- a literal substring([0-9.]+)
- Group 1 matching 1 or more digits and/or .
symbols_
- an underscore([0-9.]+)
- Group 2 matching 1 or more digits and/or .
symbols_
- an underscore([0-9]+(?:\.[0-9]+)*)
- Group 3 matching 1 or more digits, then 0+ sequences of a .
and 1 or more digitsimport re
rx = r"\./([^\W_]+)_word1_([0-9.]+)_([0-9.]+)_([0-9]+(?:\.[0-9]+)*)"
s = "./test1_word1_1.1_1.2_1.3.csv"
m = re.search(rx, s)
if m:
print("Part1: {}\nPart2: {}\nPart3: {}\nPart4: {}".format(m.group(1), m.group(2), m.group(3), m.group(4) ))
Output:
Part1: test1
Part2: 1.1
Part3: 1.2
Part4: 1.3
Upvotes: 2