rzaratx
rzaratx

Reputation: 824

Sort dicoms images using the metadata?

I am trying to sort dicoms of multiple subjects into their respective folders based on their PatientID. The current directory has all the dicoms for all subjects without sorting. I am able to go through a dicom directory and group subjects by their PatientID and count how many dicoms each subject has. Is it possible to copy or move the dicoms to another directory and sort them in a folder based on their PatientID.

code:

os.listdir('\\dicoms')

device = torch.device("cuda")
print(device)
input_path = '\\dicoms\\'

ds_columns = ['ID', 'PatientID', 'Modality', 'StudyInstance',
                'SeriesInstance', 'PhotoInterpretation', 'Position0',
                'Position1', 'Position2', 'Orientation0', 'Orientation1',
                'Orientation2', 'Orientation3', 'Orientation4', 'Orientation5',
                'PixelSpacing0', 'PixelSpacing1']

def extract_dicom_features(ds):
    ds_items = [ds.SOPInstanceUID,
                ds.PatientID,
                ds.Modality,
                ds.StudyInstanceUID,
                ds.SeriesInstanceUID,
                ds.PhotometricInterpretation,
                ds.ImagePositionPatient,
                ds.ImageOrientationPatient,
                ds.PixelSpacing]
    line = []
    for item in ds_items:
        if type(item) is pydicom.multival.MultiValue:
            line += [float(x) for x in item]
        else:
            line.append(item)
    return line

list_img = os.listdir(input_path + 'imgs')
print(len(list_img))
df_features = []
for img in tqdm.tqdm(list_img):
    img_path = input_path + 'imgs/' + img
    ds = pydicom.read_file(img_path)
    df_features.append(extract_dicom_features(ds))
df_features = pd.DataFrame(df_features, columns=ds_columns)

df_features.head()
df_features.to_csv('\\meta.csv')
print(Counter(df_features['PatientID']))

example of metadata:

,ID,PatientID,Modality,StudyInstance,SeriesInstance,PhotoInterpretation,Position0,Position1,Position2,Orientation0,Orientation1,Orientation2,Orientation3,Orientation4,Orientation5,PixelSpacing0,PixelSpacing1


0,ID_000012eaf,ID_f15c0eee,CT,ID_30ea2b02d4,ID_0ab5820b2a,MONOCHROME2,-125.0,-115.89798,77.970825,1.0,0.0,0.0,0.0,0.927184,-0.374607,0.488281,0.488281

example of Counter output:

Counter({'ID_19702df6': 28, 'ID_b799ed34': 26, 'ID_e3523464': 26, 'ID_cd9169c2': 26, 'ID_e326a8a4': 24, 'ID_45da90cb': 24, 'ID_99e4f787': 24, 'ID_df751e93': 24, 'ID_929a5b39': 20})

I added the following code to try to sort the images into subdirectories but I run into an error:

dest_path = input_path+'imageProcessDir'
counter = 0
for index, rows in df_features.iterrows():
    filename = basename(rows['ID'])
    image = cv2.imread(input_path+rows['ID'])
    counter=counter+1
    fold = rows['PatientID']+"/"
    dest_fold = dest_path+fold
    cv2.imwrite(dest_fold+"/"+filename+ "_" +str(counter)+".dcm", img)

error:

Traceback (most recent call last):
  File "ct_move.py", line 77, in <module>
    cv2.imwrite(dest_fold+"/"+filename+ "_" +str(counter)+".dcm", img)
TypeError: Expected cv::UMat for argument 'img'

Upvotes: 0

Views: 1189

Answers (3)

sturgemeister
sturgemeister

Reputation: 456

To address your issue, it seems like overkill to use opencv here at all. If all you want to do is to move the dicom images from one location into another on the filesystem, you could use os.rename or shutil.move if you are on a UNIX-like system. Unless you are modifying image content, these are cleaner and faster solutions.

I noticed two little things in your last code block:

  • I think I noticed that you want the fold variable to have the "/" prefixed instead of suffixed for the paths to work.

  • Also, the counter will continue to increment across all dicoms, where I think you want it to increment on a per-subject basis (I am assuming that df_features will be sorted on PatientID here, if it is not, maybe you could use the Counter class).

dest_path = input_path+'imageProcessDir'
counter = 0
prev_fold = '/' + df_features.loc[0, 'PatientID']
for index, rows in df_features.iterrows():
    filename = basename(rows['ID'])
    counter=counter + 1
    fold = '/' + rows['PatientID']
    dest_fold = dest_path + fold
    out_file = dest_fold + "/" + filename + "_" + str(counter) + ".dcm"
    os.rename(input_path + rows['ID'], out_file)
    if fold != prev_fold:
        counter = 0  # reset when the PatientID changes
    prev_fold = fold

I would also use os.path.join to handle filesystem paths instead of adding "/" to everything:

fold = rows['PatientID']
dest_fold = os.path.join(dest_path, fold)

as I think that there is also an issue with the input file path: input_path + rows['ID']

edit:

This is to get rid of the use of '/' and put in os.path.join

dest_path = os.path.join(input_path, 'imageProcessDir')
counter = 0
prev_fold = df_features.loc[0, 'PatientID']
for index, rows in df_features.iterrows():
    filename = basename(rows['ID'])
    counter=counter + 1
    fold = rows['PatientID']
    dest_fold = os.path.join(dest_path, fold)
    os.makedirs(dest_fold, exist_ok=True)  # make sure target folder exists
    out_file = os.path.join(dest_fold, filename + "_" + str(counter) + ".dcm")
    os.rename(os.path.join(input_path, rows['ID']), out_file)
    if fold != prev_fold:
        counter = 0  # reset when the PatientID changes
    prev_fold = fold

Also, note that os.rename(os.path.join(input_path, rows['ID']), out_file) may need to be os.rename(os.path.join(input_path, rows['ID'] + '.dcm'), out_file)

If it's not too much, you may want to make a backup of your files before attempting this, to make sure you get what you want out!

Upvotes: 1

rzaratx
rzaratx

Reputation: 824

Thank you I solved the problem with your help.

Solution:

os.listdir('directory')
device = torch.device("cuda")
print(device)
input_path = 'directory\\'

ds_columns = ['ID', 'PatientID', 'Modality', 'StudyInstance',
                'SeriesInstance', 'PhotoInterpretation', 'Position0',
                'Position1', 'Position2', 'Orientation0', 'Orientation1',
                'Orientation2', 'Orientation3', 'Orientation4', 'Orientation5',
                'PixelSpacing0', 'PixelSpacing1']

def extract_dicom_features(ds):
    ds_items = [ds.SOPInstanceUID,
                ds.PatientID,
                ds.Modality,
                ds.StudyInstanceUID,
                ds.SeriesInstanceUID,
                ds.PhotometricInterpretation,
                ds.ImagePositionPatient,
                ds.ImageOrientationPatient,
                ds.PixelSpacing]
    line = []
    for item in ds_items:
        if type(item) is pydicom.multival.MultiValue:
            line += [float(x) for x in item]
        else:
            line.append(item)
    return line

list_img = os.listdir(input_path)
print(len(list_img))
print('***********')
print(list_img)
print('***********')

df_features = []
for img in tqdm.tqdm(list_img):
    img_path = input_path + img
    ds = pydicom.read_file(img_path)
    df_features.append(extract_dicom_features(ds))
df_features = pd.DataFrame(df_features, columns=ds_columns)
print(df_features)
print('***********')
df_features.head()
df_features.to_csv('\\test_meta.csv')
print(Counter(df_features['PatientID']))
print('***********')
df_features['ID'] = df_features['ID'].astype(str) + ".dcm"
print(df_features)
print('***********')

dest_path = '\\sorted'
counter = 0
prev_fold = '\\' + df_features.loc[0, 'PatientID']
for index, rows in df_features.iterrows():
    filename = basename(rows['ID'])
    counter=counter + 1
    fold = '\\' + rows['PatientID']
    dest_fold = dest_path + fold

    out_file = os.path.join(dest_fold, filename)
    print(out_file)
    print('-------------')
    if not os.path.exists(dest_fold):
        os.mkdir(dest_fold)
    os.rename(os.path.join(input_path, rows['ID']), out_file)
    if fold != prev_fold:
        counter = 0
    prev_fold = fold

Upvotes: 0

Richard
Richard

Reputation: 3404

I'd also second ditching CV for this - it is overkill.

Try pydicom instead.

What I'd do for your problem (move all files with same patient ID into their own folder, and count how many for each) is:

  1. get list of dicom files as a list (use glob.glob to search a directory and/or just pass in the full file list via argv)
  2. load all those files into a list of pydicom dicom file objects (DataSets), so something like:
import pydicom

for fname in glob.glob(sys.argv[1], recursive=False):
    print("loading: {}".format(fname))
    files.append(pydicom.read_file(fname))
  1. go through that list and move (creating new directory if required) that file. So something like (not working code - I can't off the top of my head remember the os module methods I'm just putting the function in <>'s, just showing how conceptually to do it):
from collections import defaultdict
# dict for counting number of files for each patient ID
patient_id_count = defaultdict(lambda: 0)

for f in files:
    id = f.PatientID    # this gets the patient ID from the current file
    if os.<directory doesnt exist>(id):
        os.<create directory>(id)
    os.<move>(f.file_name, id)
    patient_id_count{id} += 1

Upvotes: 1

Related Questions