Reputation: 97
We are using some filename structures, how the users have to save files.
I have a huge list of all of our files in excel, and I have to take some inspection with the filenames.
In short explanation the filename begins with a part number. The part number contains "groups", separated with a hyphen.
The main problem is that the users sometimes use spaces (randomly) with the hyphens for the group separations.
I have to mark the file names in the list like this: -correct -similar but wrong -not match
Similar means if the order of the groups is right, but the group separation is not only a hyphen(-), but combined with a space(s): ' - '
or '- '
or ' -'
or '- '
etc..
I've writed a regex macro in VBA. This works well, but I stucked with the "similar" pattern.
Here is a simplified version of one structure in regex:
^(\d{4}-\d{2}(?:-\d{3})?-[A-Z]\d{3}-[A-Z])(?: - )(.*)
In this case the interesting part is the first capturing group, the part number. As you see, in the first capturing group there is a non capturing group, that is optional. The two capturing group (part number and description) is separated with ' - '.
Examples for a correct filename:
1111-22-333-A444-B - DESCR.EXT
1111-22-A444-B - DESCR.EXT
Examples for a similar but wrong file name:
1111-22 -333-A444-B - DESCR.EXT
1111-22- A444-B - DESCR.EXT
1111 -22-333-A444-B - DESCR.EXT
1111 -22 - A444- B - DESCR.EXT
1111 - 22 - A444 - B - DESCR.EXT
Examples for a non matching filename:
1111-22-333-A444 - DESCR.EXT
1111-22-B - DESCR.EXT
1111-22-333-A444-BDESCR.EXT
1111-22 - DESCR.EXT
1111-22-33-444-B - DESCR.EXT
1111-22-444-B - DESCR.EXT
I can mark the correct and non matching values with the pattern above, but I don't know how can modify to check what is "similar"? I tried to search solution here and in google but didn't find :/
Thank you
Upvotes: 1
Views: 57
Reputation: 16138
Change -
and (?: - )
to be \s*-\s*
so any arbitrary quantity of leading and/or trailing white space is accepted, including none (just a hyphen). (I've also removed the enclosing (?:…)
since it was not being used.)
In this interactive demo, I've got two versions of your regex. One for validity (the regex in the question) and one for similarity that is more relaxed about spacing with my suggestion above. Valid entries are colored green, similar entries are red. You can toy with the regexes and re-run as needed.
function check(elem) {
let next = elem.nextElementSibling;
let okay = elem.innerHTML.match(document.getElementById("okay").value);
let sim = elem.innerHTML.match(document.getElementById("sim").value);
if (okay) {
next.innerHTML = " → 1=[" + okay[1] + "] 2=[" + okay[2] + "]";
next.className = "";
} else if (sim) {
next.innerHTML = " → 1=[" + sim[1] + "] 2=[" + sim[2] + "]";
next.className = "similar";
} else {
next.innerHTML = "";
}
}
function go() {
document.querySelectorAll("li pre").forEach(item => check(item));
}
li { list-style:none; }
pre { display:inline-block; }
pre, ul, li { margin-top:0; margin-bottom:0 }
input[type="text"] { width:96%; font-family:monospace; }
input { display:block; }
pre + b { color:#080; font-family:monospace; }
pre + b.similar { color:#800; }
Valid: <input type="text" id="okay"
value="^(\d{4}-\d{2}(?:-\d{3})?-[A-Z]\d{3}-[A-Z]) - (.*)" />
Similar: <input type="text" id="sim"
value="^(\d{4}\s*-\s*\d{2}(?:\s*-\s*\d{3})?\s*-\s*[A-Z]\d{3}\s*-\s*[A-Z])\s*-\s*(.*)" />
<input type="button" value="go" onclick="go()" />
<b>Correct</b>
<ul id="correct">
<li><pre>1111-22-333-A444-B - DESCR.EXT</pre><b></b></li>
<li><pre>1111-22-A444-B - DESCR.EXT</pre><b></b></li>
</ul>
<b>Similar but wrong file name</b>
<ul id="similar">
<li><pre>1111-22 -333-A444-B - DESCR.EXT</pre><b></b></li>
<li><pre>1111-22- A444-B - DESCR.EXT</pre><b></b></li>
<li><pre>1111 -22-333-A444-B - DESCR.EXT</pre><b></b></li>
<li><pre>1111 -22 - A444- B - DESCR.EXT</pre><b></b></li>
<li><pre>1111-22-333-A444-B - DESCR.EXT</pre><b></b></li>
<li><pre>1111 - 22 - A444 - B - DESCR.EXT</pre><b></b></li>
</ul>
<b>Non-matching filename</b>
<ul id="non-matching">
<li><pre>1111-22-333-A444 - DESCR.EXT</pre><b></b></li>
<li><pre>1111-22-B - DESCR.EXT</pre><b></b></li>
<li><pre>1111-22-333-A444-BDESCR.EXT</pre><b></b></li>
<li><pre>1111-22 - DESCR.EXT</pre><b></b></li>
<li><pre>1111-22-33-444-B - DESCR.EXT</pre><b></b></li>
<li><pre>1111-22-444-B - DESCR.EXT</pre><b></b></li>
</ul>
As you can see, the second-to-last "similar" filename matches your original regex. I'm not sure what was intended there.
Upvotes: 1