Graeme
Graeme

Reputation: 103

Replace strings in a list (using re.sub)

I am trying to replace parts of file extensions in a list of files. I would like to be able to loop through items (files), and remove the extensions. I don't know how to appropriately loop through items in the list when re.sub as the third parameter requires a string. eg. re.sub(pattern, repl, string, count=0, flags=0)

import re

file_lst = ['cats1.fa', 'cats2.fa', 'dog1.fa', 'dog2.fa']
file_lst_trimmed =[]

for file in file_lst:
    file_lst_trimmed = re.sub(r'1.fa', '', file)

The issue arising here is that re.sub expects a string and I want it to loop through a list of strings.

Thanks for any advice!

Upvotes: 9

Views: 28961

Answers (5)

colopop
colopop

Reputation: 373

Your loop is actually perfectly fine! There are two other issues.

  1. You're setting file_lst_trimmed equal to your string every iteration of the loop. You want to use append as in file_lst_trimmed.append("apple").

  2. Your regular expression is '1.fa' when it should really just be '.fa' (assuming you only want to strip .fa extensions).

EDIT: I now see that you also want to remove the last number. In that case, you'll want '\d+\.fa' (\d is a stand-in for any digit 0-9, and \d+ means a string of digits of any length -- so this will remove 10, 11, 13254, etc. The \ before the . is because . is a special character that needs to be escaped.) If you want to remove arbitrary file extensions, you'll want to put \w+ instead of fa -- a string of letters of any length. You might want to check out the documentation for regex.

Upvotes: 0

Anton vBR
Anton vBR

Reputation: 18916

No need for regex, use the standard library os and os.path.splittext for this.

Split the pathname path into a pair (root, ext) such that root + ext == path, and ext is empty or begins with a period and contains at most one period. Leading periods on the basename are ignored; splitext('.cshrc') returns ('.cshrc', '').

import os.path

l = ['hello.fa', 'images/hello.png']

[os.path.splitext(filename)[0] for filename in l]

Returns

['hello', 'images/hello']

Upvotes: 0

Amit Tripathi
Amit Tripathi

Reputation: 7261

I prefer to python internal functions rather than importing and using a library if possible. Using regex for such simple task might not be the best way to do it. This approach looks clean.

Try this

file_lst = ['cats1.fa', 'cats2.fa', 'dog1.fa', 'dog2.fa']
file_lst_trimmed =[]
for file in file_lst:
    file_lst_trimmed.append(file.split('.')[0][:-1])

Upvotes: 0

Ajax1234
Ajax1234

Reputation: 71461

You can try this:

import re
file_lst = ['cats1.fa', 'cats2.fa', 'dog1.fa', 'dog2.fa']
final_list = [re.sub('\d+\.\w+$', '', i) for i in file_lst]

Output:

['cats', 'cats', 'dog', 'dog']

Upvotes: 5

James
James

Reputation: 36691

You can use a list comprehension to construct the new list with the cleaned up files names. \d is the regex to match a single character and $ only matches at the end of the string.

file_lst_trimmed = [re.sub(r'\d\.fa$', '', file) for file in file_lst]

The results:

>>> file_lst_trimmed 
['cats', 'cats', 'dog', 'dog']

Upvotes: 23

Related Questions