Paul Kloppers
Paul Kloppers

Reputation: 29

REGEX - Remove Unwanted Text

I have a list of Items example (files in a folder), each item in the list is in its own string.

in the example the X--Y-- Have incrementing Digits.

my program has the filenames in a list eg : ["file1.txt", "file2.txt"]

item 1: "X1Y2 alehandro alex.txt"

item 2: "X1Y3 james file of files.txt"

so for each string i want to keep only the first Part the "X1Y2" parts for each file so I need to remove all the extra text on the filename.

I just want a regex expression on how to do this, I still do struggle with regex.

I need to pass this through a, replace with "" algorithm,

(using microsoft powertoys-rename to do this..

Alternatives in powershell also welcome.

any advice would be appreciated

I Want output to be the following

["X1Y2.txt","X2Y3.txt","X4Y3.txt"] with the unwanted extra text removed.

Upvotes: 0

Views: 51

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521467

A general solution using re.sub along with a list comprehension might be:

files = ["X1Y2 alehandro alex.txt", "X1Y3 james file of files.txt"]
output = [re.sub(r'(\S+).*\.(\w+)$', r'\1.\2', f) for f in files]
print(output)  # ['X1Y2.txt', 'X1Y3.txt']

Upvotes: 1

Related Questions