How i can extract information from DICOM file ?

Question

I want to write a script to extract the header information of a DICOM file using c or c ++, I don't want to use external libraries like dicomsdl... when I open the file with Bloc-notes I see special characters and character string as patient name .. if anyone can help me to read this file.

thurizas · Accepted Answer

Yes, I would open the file in binary even though it might contain sequences of characters. With out too deep into it, consider writing the following record of out to a file (I'm showing the record as a C-struct):

    struct rec_tag
    {
         int    id;
         char   name[50];
    };

now, suppose I use that structure to create a file, as shown in the following code:

file1.c:

/* compile as: gcc -ansi -pedantic -Wall file.c -o file_test */
#include 
#include 
#include 


struct rec_tag
{
    int   id;
    char  name[50];
};

int main(int argc, char** argv)
{
    FILE*          fp = NULL;
    struct rec_tag rec1;
    struct rec_tag rec2;

    rec1.id = 20;
    strcpy(rec1.name, "thurizas");

    rec2.id = 345689;
    strcpy(rec2.name, "Marouane");

    if(NULL != (fp = fopen("./short.dat", "ab")))
    {
         fwrite(&rec1, sizeof(struct rec_tag), 1, fp);
         fwrite(&rec2, sizeof(struct rec_tag), 1, fp);

         fclose(fp);
    }
    return 0;
}

Now, suppose I open this file in emacs, lots of special symbols (such as ^T and ^@) with strings interspersed with strings. It can be instructive to open the file in a hex editor (say okteta) and we see:

    14 00 00 00 74 68 75 72 69 7A 61 73 00 00 00 00 01 00 00 00 00 00
    00 00 ED 06 40 00 00 00 00 00 C2 00 00 00 00 00 00 00 00 00 00 00 
    00 00 00 00 A0 06 40 00 00 00 00 00 59 46 05 00 4D 61 72 6F 75 61 
    6E 65 00 7F 00 00 2E 4E 3D F6 00 00 00 00 67 03 40 00 00 00 00 00 
    FF FF FF FF 00 00 00 00 C0 B5 B3 C5 FF 7F 00 00 38 F1 CA BE 31 7F 
    00 00

Now, the sequence of hex digits 74 68 75 72 69 7A 61 73 are the ASCII codes for "thurizas" (which most editors will display). Now the first four bytes in the file are the id number. Now this present another (potential) issue, I created the file on a computer with an x86_64 process, and thus an integer is stored in memory in little-endian form, so the sequence 14 00 00 00 needs to be read ... backwards (for lack of a better term) as 00 00 00 14 which is the 32-bit hexadecimal representation of 20.

Also, notice that because I was not particularly careful on how I treated my character arrays that there are extraneous garbage bytes in the file.

Now, with out knowing the format of the file (i.e. how data is written to the file), I would have a hard time to figure out how to read it in. However, because we know the format we can write a simple program to read it:

file1.c:

/* compile as: gcc -ansi -pedantic -Wall file1.c -o read_test */
#include 
#include 
#include 

struct rec_tag
{
    int   id;
    char  name[50];
};

int main(int argc, char** argv)
{
     FILE*          fp = NULL;
     struct rec_tag rec1;
     struct rec_tag rec2;

     if(NULL != (fp = fopen("./short.dat", "rb")))
     {
          fread(&rec1, sizeof(struct rec_tag), 1, fp);
          fread(&rec2, sizeof(struct rec_tag), 1, fp);

          printf("id: %d, name: %s
", rec1.id, rec1.name);
          printf("id: %d, name: %s
", rec2.id, rec2.name);

          fclose(fp);
     }

     return 0;
 }

and when run, produces this result:

    [******@broadsword junk]$ ./read_test
    id: 20, name: thurizas
    id: 345689, name: Marouane

Hopefully, this helps on how to interpret a file and shows one way of reading it in. So in you situation, I would do the following steps

Get and read the formal specification for a DICOM file.
Try a "hand" read of the file. Open the file up in a hex editor, and using the specification see if you can step through the file and figure out how the data is stored.
Write a program to read in the data.

Finally, the disclaimers:

All code was compiled using gcc version 4.8.2 and run on a Centos 7 system.
I know that the b flag to fopen and fread is ignored on all POSIX compliant systems (including Linux), I put it there in case the code is run on a non-POSIX system, and also to be explicit that I was doing binary I/O
Error checking and handling was kept to a minimum to prevent this post from becoming a wall-of-text (which it did).

Hope this helps, T.

How i can extract information from DICOM file ?

Answers (2)

file1.c:

file1.c:

Related Questions