Reputation: 824
I am trying to sort dicoms of multiple subjects into their respective folders based on their PatientID. The current directory has all the dicoms for all subjects without sorting. I am able to go through a dicom directory and group subjects by their PatientID and count how many dicoms each subject has. Is it possible to copy or move the dicoms to another directory and sort them in a folder based on their PatientID.
code:
os.listdir('\\dicoms')
device = torch.device("cuda")
print(device)
input_path = '\\dicoms\\'
ds_columns = ['ID', 'PatientID', 'Modality', 'StudyInstance',
'SeriesInstance', 'PhotoInterpretation', 'Position0',
'Position1', 'Position2', 'Orientation0', 'Orientation1',
'Orientation2', 'Orientation3', 'Orientation4', 'Orientation5',
'PixelSpacing0', 'PixelSpacing1']
def extract_dicom_features(ds):
ds_items = [ds.SOPInstanceUID,
ds.PatientID,
ds.Modality,
ds.StudyInstanceUID,
ds.SeriesInstanceUID,
ds.PhotometricInterpretation,
ds.ImagePositionPatient,
ds.ImageOrientationPatient,
ds.PixelSpacing]
line = []
for item in ds_items:
if type(item) is pydicom.multival.MultiValue:
line += [float(x) for x in item]
else:
line.append(item)
return line
list_img = os.listdir(input_path + 'imgs')
print(len(list_img))
df_features = []
for img in tqdm.tqdm(list_img):
img_path = input_path + 'imgs/' + img
ds = pydicom.read_file(img_path)
df_features.append(extract_dicom_features(ds))
df_features = pd.DataFrame(df_features, columns=ds_columns)
df_features.head()
df_features.to_csv('\\meta.csv')
print(Counter(df_features['PatientID']))
example of metadata:
,ID,PatientID,Modality,StudyInstance,SeriesInstance,PhotoInterpretation,Position0,Position1,Position2,Orientation0,Orientation1,Orientation2,Orientation3,Orientation4,Orientation5,PixelSpacing0,PixelSpacing1
0,ID_000012eaf,ID_f15c0eee,CT,ID_30ea2b02d4,ID_0ab5820b2a,MONOCHROME2,-125.0,-115.89798,77.970825,1.0,0.0,0.0,0.0,0.927184,-0.374607,0.488281,0.488281
example of Counter output:
Counter({'ID_19702df6': 28, 'ID_b799ed34': 26, 'ID_e3523464': 26, 'ID_cd9169c2': 26, 'ID_e326a8a4': 24, 'ID_45da90cb': 24, 'ID_99e4f787': 24, 'ID_df751e93': 24, 'ID_929a5b39': 20})
I added the following code to try to sort the images into subdirectories but I run into an error:
dest_path = input_path+'imageProcessDir'
counter = 0
for index, rows in df_features.iterrows():
filename = basename(rows['ID'])
image = cv2.imread(input_path+rows['ID'])
counter=counter+1
fold = rows['PatientID']+"/"
dest_fold = dest_path+fold
cv2.imwrite(dest_fold+"/"+filename+ "_" +str(counter)+".dcm", img)
error:
Traceback (most recent call last):
File "ct_move.py", line 77, in <module>
cv2.imwrite(dest_fold+"/"+filename+ "_" +str(counter)+".dcm", img)
TypeError: Expected cv::UMat for argument 'img'
Upvotes: 0
Views: 1189
Reputation: 456
To address your issue, it seems like overkill to use opencv here at all. If all you want to do is to move the dicom images from one location into another on the filesystem, you could use os.rename
or shutil.move
if you are on a UNIX-like system. Unless you are modifying image content, these are cleaner and faster solutions.
I noticed two little things in your last code block:
I think I noticed that you want the fold
variable to have the "/"
prefixed instead of suffixed for the paths to work.
Also, the counter will continue to increment across all dicoms, where I think you want it to increment on a per-subject basis (I am assuming that df_features will be sorted on PatientID here, if it is not, maybe you could use the Counter
class).
dest_path = input_path+'imageProcessDir'
counter = 0
prev_fold = '/' + df_features.loc[0, 'PatientID']
for index, rows in df_features.iterrows():
filename = basename(rows['ID'])
counter=counter + 1
fold = '/' + rows['PatientID']
dest_fold = dest_path + fold
out_file = dest_fold + "/" + filename + "_" + str(counter) + ".dcm"
os.rename(input_path + rows['ID'], out_file)
if fold != prev_fold:
counter = 0 # reset when the PatientID changes
prev_fold = fold
I would also use os.path.join
to handle filesystem paths instead of adding "/" to everything:
fold = rows['PatientID']
dest_fold = os.path.join(dest_path, fold)
as I think that there is also an issue with the input file path: input_path + rows['ID']
edit:
This is to get rid of the use of '/'
and put in os.path.join
dest_path = os.path.join(input_path, 'imageProcessDir')
counter = 0
prev_fold = df_features.loc[0, 'PatientID']
for index, rows in df_features.iterrows():
filename = basename(rows['ID'])
counter=counter + 1
fold = rows['PatientID']
dest_fold = os.path.join(dest_path, fold)
os.makedirs(dest_fold, exist_ok=True) # make sure target folder exists
out_file = os.path.join(dest_fold, filename + "_" + str(counter) + ".dcm")
os.rename(os.path.join(input_path, rows['ID']), out_file)
if fold != prev_fold:
counter = 0 # reset when the PatientID changes
prev_fold = fold
Also, note that os.rename(os.path.join(input_path, rows['ID']), out_file)
may need to be os.rename(os.path.join(input_path, rows['ID'] + '.dcm'), out_file)
If it's not too much, you may want to make a backup of your files before attempting this, to make sure you get what you want out!
Upvotes: 1
Reputation: 824
Thank you I solved the problem with your help.
Solution:
os.listdir('directory')
device = torch.device("cuda")
print(device)
input_path = 'directory\\'
ds_columns = ['ID', 'PatientID', 'Modality', 'StudyInstance',
'SeriesInstance', 'PhotoInterpretation', 'Position0',
'Position1', 'Position2', 'Orientation0', 'Orientation1',
'Orientation2', 'Orientation3', 'Orientation4', 'Orientation5',
'PixelSpacing0', 'PixelSpacing1']
def extract_dicom_features(ds):
ds_items = [ds.SOPInstanceUID,
ds.PatientID,
ds.Modality,
ds.StudyInstanceUID,
ds.SeriesInstanceUID,
ds.PhotometricInterpretation,
ds.ImagePositionPatient,
ds.ImageOrientationPatient,
ds.PixelSpacing]
line = []
for item in ds_items:
if type(item) is pydicom.multival.MultiValue:
line += [float(x) for x in item]
else:
line.append(item)
return line
list_img = os.listdir(input_path)
print(len(list_img))
print('***********')
print(list_img)
print('***********')
df_features = []
for img in tqdm.tqdm(list_img):
img_path = input_path + img
ds = pydicom.read_file(img_path)
df_features.append(extract_dicom_features(ds))
df_features = pd.DataFrame(df_features, columns=ds_columns)
print(df_features)
print('***********')
df_features.head()
df_features.to_csv('\\test_meta.csv')
print(Counter(df_features['PatientID']))
print('***********')
df_features['ID'] = df_features['ID'].astype(str) + ".dcm"
print(df_features)
print('***********')
dest_path = '\\sorted'
counter = 0
prev_fold = '\\' + df_features.loc[0, 'PatientID']
for index, rows in df_features.iterrows():
filename = basename(rows['ID'])
counter=counter + 1
fold = '\\' + rows['PatientID']
dest_fold = dest_path + fold
out_file = os.path.join(dest_fold, filename)
print(out_file)
print('-------------')
if not os.path.exists(dest_fold):
os.mkdir(dest_fold)
os.rename(os.path.join(input_path, rows['ID']), out_file)
if fold != prev_fold:
counter = 0
prev_fold = fold
Upvotes: 0
Reputation: 3404
I'd also second ditching CV for this - it is overkill.
Try pydicom instead.
What I'd do for your problem (move all files with same patient ID into their own folder, and count how many for each) is:
glob.glob
to search a directory and/or just pass in the full file list via argv
)import pydicom
for fname in glob.glob(sys.argv[1], recursive=False):
print("loading: {}".format(fname))
files.append(pydicom.read_file(fname))
from collections import defaultdict
# dict for counting number of files for each patient ID
patient_id_count = defaultdict(lambda: 0)
for f in files:
id = f.PatientID # this gets the patient ID from the current file
if os.<directory doesnt exist>(id):
os.<create directory>(id)
os.<move>(f.file_name, id)
patient_id_count{id} += 1
Upvotes: 1