Reputation: 57
I have this folder.
Let's consider the files: sub-OAS30027_ses-d1300_run-01_T1w.nii.gz and sub-OAS30027_ses-d1300_run-02_T1w.nii.gz. They have the same initial part of the name, that is sub-OAS30027_ses-d1300.
I would like to code a script in Python that extract only one file among the ones with the same sub-OAS30027_ses-d1300, among the one with the same sub-OAS30031_ses-d0427 and so on. It's not important which file is extracted, just one.
This because sub-OAS30027_ses-d1300_run-01_T1w.nii.gz and sub-OAS30027_ses-d1300_run-02_T1w.nii.gz are like copies and i don't want them.
Could you help me ?
Upvotes: 2
Views: 60
Reputation: 286
I tried to keep it as simple as possible. I hope this helps:
import os
directory = 'directory_name' # put in the directory you want to search through
duplicate_file_lst = []
# loop through directory files
for filename in os.listdir(directory):
if filename.startswith("sub-OAS30027_ses-d1300"):
duplicate_file_lst.append(filename)
# Only keeps the first file in the list
for file in duplicate_file_lst:
if file != duplicate_file_lst[0]:
os.remove(file)
Upvotes: 1
Reputation: 2709
Use the re
and os
modules :
PS : always have a copy of the original files if something goes wrong, it can be used again.
import os,re
file = os.listdir()
match = []
for i in file:
t = re.findall('_ses\-d(.*?)_',i)
if t :
if t[0] not in match :
match.append(t[0])
else :
os.remove(i)
Upvotes: 2