Reputation: 127
I have a directory containing mutliple files with similar names and subdirectories named after these so that files with like-names are located in that subdirectory. I'm trying to concatenate all the .sdf files in a given subdirectory to a single .sdf file.
import os
from os import system
for ele in os.listdir(Path):
if ele.endswith('.sdf'):
chdir(Path + '/' + ele[0:5])
system('cat' + ' ' + '*.sdf' + '>' + ele[0:5] + '.sdf')
However when I run this, the concatenated file includes every .sdf file from the original directory rather than just the .sdf files from the desired one. How do I alter my script to concatenate the files in the subdirectory only?
Upvotes: 1
Views: 853
Reputation: 140246
this is a very clumsy way of doing it. Using chdir
is not recommended, and system
either (deprecated, and overkill to call cat
)
Let me propose a pure python implementation using glob.glob
to filter the .sdf
files, and read each file one by one and write to the big file opened before the loop:
import glob,os
big_sdf_file = "all_data.sdf" # I'll let you compute the name/directory you want
with open(big_sdf_file,"wb") as fw:
for sdf_file in glob.glob(os.path.join(Path,"*.sdf")):
with open(sdf_file,"rb") as fr:
fw.write(fr.read())
I left big_sdf_file
not computed, I would not recommend to put it in the same directory as the other files, since running the script twice would result in taking the output as input as well.
Note that the drawback of this approach is that if the files are big, they're read fully into memory, which can cause problems. In that case, replace
fw.write(fr.read())
by:
shutil.copyfileobj(fr,fw)
(importing shutil
is necessary in that case). That allows packet copy instead of full-file read/write.
I'll add that it's probably not the full solution you're expecting, since there seem to be something about scanning the sub-directories of Path
to create 1 big .sdf
file per sub-directory, but with the provided code which doesn't use any system command or chdir
, it should be easier to adapt to your needs.
Upvotes: 4