Reputation: 109
I am trying to read DICOM files using pydicom in Python and want to store the header data into a pandas dataframe. How do I extract the data element value for this purpose?
So far I have created a dataframe with columns as the tag names in the DICOM file. I have accessed the data element but I only need to store the value of the data element and not the entire sequence. For this, I converted the sequence to a string and tried to split it. But it won't work either as the length of different tags are different.
refDs = dicom.dcmread('000000.dcm')
info_header = refDs.dir()
df = pd.DataFrame(columns = info_header)
print(df)
info_data = []
for i in info_header:
if (i in refDs):
info_data.append(str(refDs.data_element(i)).split(" ")[0])
print (info_data[0],len(info_data))
I have put the data element sequence element in a list as I could not put it into the dataframe directly. The output of the above code is
(0008, 0050) Accession Number SH: '1091888302507299' 89
But I only want to store the data inside the quotes.
Upvotes: 3
Views: 5479
Reputation: 51
This works for me:
import pydicom as dicom
import pandas as pd
ds = dicom.read_file('path_to_file')
df = pd.DataFrame(ds.values())
df[0] = df[0].apply(lambda x: dicom.dataelem.DataElement_from_raw(x) if isinstance(x, dicom.dataelem.RawDataElement) else x)
df['name'] = df[0].apply(lambda x: x.name)
df['value'] = df[0].apply(lambda x: x.value)
df = df[['name', 'value']]
Eventually, you can transpose it:
df = df.set_index('name').T.reset_index(drop=True)
Nested fields would require more work if you also need them.
Upvotes: 4