Reputation: 119
I'm working on trying to rename multiple files using a regex in Python (3.8), reordering the file names for consistency. The aim is for a part number description to be moved to the beginning of the file if appropriate. Not all files contain a PNo.
Some example file name structures I am working with in my testing are shown below. I've tried to capture some of the possible variations on how things may have been entered previously.
Test Document One PNo 6477 Rev 2
Test Document TwoPno5555 - Rev 1
Test Document 3 PNo5343 rev 2
PNo 6478 - Test Document 4 Rev1
Test Document Five Pno 3333
For the most part, my regex works as desired, however there are two things I'd still like to achieve:
Documents two and four have an existing hyphen and these become duplicated when combining groups to create the new file name. I've tried adding [-] into the regex, but it breaks the third group, and I couldn't get that to work in files without a hyphen in their name. What is the best way to address this?
Second, when an existing part number does not have a space between alpha-numeric string I'd like to add it to the new file name. Can this be done using the existing python group somehow? I did consider splitting the Pno to two separate groups but thought the risk of 4 digits in other filenames (e.g.dates) would mess this up.
I'd be happy for some critique on what I've done here. This is my first attempt at writing a regex so if there's a better way, I'm all ears. Thx
PNoRegex = re.compile(r"""^(.*?)
(PNo\s\d{4}|PNo\d{4}|Pno\s\d{4}|Pno\d{4}) # part number details
\s* #remove white space after PNo string
(.*)$ # all text after Part No
""", re.VERBOSE)
for originalFile in os.listdir('.'):
fileNameText = PNoRegex.search(originalFile)
# Skip files without a Regex match
if fileNameText == None:
continue
# separate the groups
beforePNo = fileNameText.group(1)
PNo = fileNameText.group(2)
afterPNo = fileNameText.group(3)
# Form the reordered filename.
newFileName = PNo + ' - ' + beforePNo + afterPNo
Edit: Screenshots added of the files.
List of files before regex operation
Upvotes: 2
Views: 147
Reputation: 18631
Use re.sub
:
re.sub(r'(?i)^(.*?)\s*(PNo)\s*(\d{4})\s*(?:-\s*)?(.*)$', r'\2 \3 - \1 \4', string)
See proof.
Explanation:
NODE EXPLANATION
--------------------------------------------------------------------------------
(?i) set flags for this block (case-
insensitive) (with ^ and $ matching
normally) (with . not matching \n)
(matching whitespace and # normally)
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
PNo 'PNo'
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
\d{4} digits (0-9) (4 times)
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
- '-'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
( group and capture to \4:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \4
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
Upvotes: 0
Reputation: 163477
You can shorten the alternation to (P[nN]o)\s?(\d{4})
using a character class and matching an optional whitespace char.
You could use 2 capturing groups instead of 1 in case there is a space between pno and the digits.
To match the optional hyphen, you can extend matching either a whitespace char or a hyphen using a character class [-\s]*
This will result in separate groups for the parts in the current example data.
^(.*?)(P[nN]o)\s?(\d{4})[-\s]*(.*)$
Upvotes: 1