Reputation: 103
I am trying to replace parts of file extensions in a list of files. I would like to be able to loop through items (files), and remove the extensions. I don't know how to appropriately loop through items in the list when re.sub as the third parameter requires a string. eg. re.sub(pattern, repl, string, count=0, flags=0)
import re
file_lst = ['cats1.fa', 'cats2.fa', 'dog1.fa', 'dog2.fa']
file_lst_trimmed =[]
for file in file_lst:
file_lst_trimmed = re.sub(r'1.fa', '', file)
The issue arising here is that re.sub expects a string and I want it to loop through a list of strings.
Thanks for any advice!
Upvotes: 9
Views: 28961
Reputation: 373
Your loop is actually perfectly fine! There are two other issues.
You're setting file_lst_trimmed
equal to your string every iteration of the loop. You want to use append
as in file_lst_trimmed.append("apple")
.
Your regular expression is '1.fa'
when it should really just be '.fa'
(assuming you only want to strip .fa extensions).
EDIT: I now see that you also want to remove the last number. In that case, you'll want '\d+\.fa'
(\d
is a stand-in for any digit 0-9, and \d+
means a string of digits of any length -- so this will remove 10, 11, 13254, etc. The \
before the .
is because .
is a special character that needs to be escaped.) If you want to remove arbitrary file extensions, you'll want to put \w+
instead of fa
-- a string of letters of any length. You might want to check out the documentation for regex.
Upvotes: 0
Reputation: 18916
No need for regex, use the standard library os
and os.path.splittext for this.
Split the pathname path into a pair (root, ext) such that root + ext == path, and ext is empty or begins with a period and contains at most one period. Leading periods on the basename are ignored; splitext('.cshrc') returns ('.cshrc', '').
import os.path
l = ['hello.fa', 'images/hello.png']
[os.path.splitext(filename)[0] for filename in l]
Returns
['hello', 'images/hello']
Upvotes: 0
Reputation: 7261
I prefer to python internal functions rather than importing and using a library if possible. Using regex for such simple task might not be the best way to do it. This approach looks clean.
Try this
file_lst = ['cats1.fa', 'cats2.fa', 'dog1.fa', 'dog2.fa']
file_lst_trimmed =[]
for file in file_lst:
file_lst_trimmed.append(file.split('.')[0][:-1])
Upvotes: 0
Reputation: 71461
You can try this:
import re
file_lst = ['cats1.fa', 'cats2.fa', 'dog1.fa', 'dog2.fa']
final_list = [re.sub('\d+\.\w+$', '', i) for i in file_lst]
Output:
['cats', 'cats', 'dog', 'dog']
Upvotes: 5
Reputation: 36691
You can use a list comprehension to construct the new list with the cleaned up files names. \d
is the regex to match a single character and $
only matches at the end of the string.
file_lst_trimmed = [re.sub(r'\d\.fa$', '', file) for file in file_lst]
The results:
>>> file_lst_trimmed
['cats', 'cats', 'dog', 'dog']
Upvotes: 23