Natalia Hassan
Natalia Hassan

Reputation: 15

Is it possible to read character array elements into a struct using ```strtoul``` in C?

I'm working on a project for a class and I could use some guidance. I need to parse a character array into constituent parts - the specifications of which I am given - but I am unsure how to do so in C.

I have been given a file and each page of the file is read into a buffer as a character array like so:

    typedef struct page_t {
    char reserved[PAGESIZE];
    } page_t;

I have been given the following specifications about the pages read:

  1. For each page it starts with a 2 byte gap offset, followed key-value records, a gap at the indicated offset, and lastly an 8 byte address at the end pointing to the next page
  2. The key-value records are of the following form: 8 byte unsigned integer key followed by a value where the first 4 bytes are an unsigned integer inidicating the length of the string part of the value and a string of variable length (it will be the length indicated in the 4 bytes previously mentioned so the total length of the value portion will be length+4)
  3. There can be multiple key-value records in the file but the sum of all key-value records will not exceed 4086 bytes and the gap is always at the end of the file prior to the address of the next page

Since I have not been given anymore explanation about format of the page read in and I need to parse through the char array I was wondering if I could do something like use the strtoul function to read the 8 bytes of the array at a time to find the correct key (and to skip over the key's values if they are not the key I am trying to match). I asked my TA about it and the answer I got was:

You can use functions that convert character (byte) arrays to numbers. Consider making a toy example program that converts a structure to a character array and back to see if scan/atoi/strtoll... have the expected behavior. If the functions do not work you can also consider reading iteratively. You may find them useful to extract the key/value size. The value as a string should work!

So I tried making a short program that converted a struct to an array and back and tried using strtoul on the string but I'm not sure that I'm doing it correctly.

So my tester program looks like this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

typedef struct record_test {
    uint64_t key;
    uint32_t val_size;
    char value[255];
} record_test;

int main( int argc, char ** argv ) {
    record_test record = {1234, 13, "asdfghjklqwer"};
    char page[4096];

    // print what is in record
    printf("Here's the record itself:\n");
    printf("key: %llu\n", record.key);
    printf("val_size: %u\n", record.val_size);
    printf("record: %s\n", record.value);

    memcpy(page, &record, sizeof(record_test));

    // print what is in page
    printf("Here's what's in the page:\n");
    printf("page: %s\n", page);

    // check page contents with pointer 
    record_test* revert;
    revert = (record_test*)page;
    printf("Here's the reverted record using pointers:\n");
    printf("key: %llu\n", revert->key);
    printf("val_size: %u\n", revert->val_size);
    printf("record: %s\n", revert->value);
    

    // reading what is in page using strtoul
    char* endKey;
    char* value;

    printf("reading using strtoul:\n");
    printf("key: %lu\n", strtoul(page, &endKey, 8)); 
    printf("val size: %d\n", (int)strtoul(endKey, &value, 4));
    printf("value: %s\n", value);
}

And these are the results I'm getting from it when I use printf to follow it:

Here's the record itself:
key: 1234
val_size: 13
record: asdfghjklqwer
Here's what's in the page:
page: ?
Here's the reverted record using pointers:
key: 1234
val_size: 13
record: asdfghjklqwer
reading using strtoul:
key: 0
val size: 0
value: ?

So based on the pointer that I used to recast the struct, the character array does have the right information in it but for whatever reason the character array itself is showing ? when I try to print it and similarly the printf statements showing what strtoul is reading is showing 0 for the integers. I'm not sure what's going on here, why am I getting ? when that character isn't even in the value string?Can someone tell me where I am going wrong or if I can even use this function at all? Should I be trying to iterate though the character array using bitwise operations to read it instead?

Any help would be great! Thank you!

Upvotes: 0

Views: 209

Answers (1)

Icemanind
Icemanind

Reputation: 48696

I'm going to try to help you understand what's happening here. When you do memcpy to "flatten" your structure, let's analyze what should be going into memory.

We start out with 1234. Convert that to hexadecimal and that becomes 04D2. Now a uint64_t is probably an 8 byte long structure on your machine (you can verify this by doing a sizeof(uint64_t)), so in memory you can expect the first 8 bytes to be 00 00 00 00 00 00 04 D2.

Next up, you have 13, which in hexadecimal is 0D and it's in a uint32_t. This is typically half of what a uint64_t is, probably 4 bytes long on your machine (again, you can verify with sizeof). This means the next 4 bytes would be 00 00 00 0D.

Finally, you have an array of 255 char. char's are 1 byte long each. Each letter in your text asdfghjklqwer gets converted to an ASCII code representing that letter, so the hexadecimal would be 61 73 64 66 67 ... and the rest of those 255 bytes are just random data that's in your memory.

Now one final thing to keep in mind is the endianness of your computer. If your computer has an Intel processor or AMD processor, then your computer is using little-endian. If you're unfamiliar with what endianness is, then look at this Wikipedia article for an explanation. But, simply put, endianness refers to the order that bytes are written to memory. Little endian (which is probably what you have) means that the little ends of the bytes are written first.

So what does this mean? Up above I said the first 8 bytes in memory would be 00 00 00 00 00 00 04 D2. For little endian machines, this isn't really true. The bytes are actually written right to left. What's actually in memory would be D2 04 00 00 00 00 00 00. Hopefully this makes sense.

So now, with some little modifications to your program, you can actually print out what's in your computer's memory and you can see more clearly what I am talking about.

First, in your program, change char page[4096]; to unsigned char page[4096];. The reason is because all this would be easier to understand with unsigned characters. If you really want to know how signed and unsigned numbers work in a computer system, Google twos compliment to learn more. For now, just change it to unsigned. Then add this to your program:

record_test record = { 1234, 13, "asdfghjklqwer" };
unsigned char page[4096];

// print what is in record
printf("Here's the record itself:\n");
printf("key: %llu\n", record.key);
printf("val_size: %u\n", record.val_size);
printf("record: %s\n", record.value);

memcpy(page, &record, sizeof(record_test));

// print what is in page
for (int i = 0; i < sizeof(record_test); i++)
{
    printf("page[%d] = %02X\n", i, page[i]);
}

When you run this program, it will execute the memcpy like before, but then I have it printing out the data stored at the page address. Try modifying your record and see if you can understand what my explanation is all about!

Hopefully this all made sense! Good luck!!!

Upvotes: 0

Related Questions