Reputation: 1

Comparing strings in two files in C;

I'm new to language c, so I'll appreciate every help :D I need to compare given words in the first file ( " Albert\n Martin\n Bob" ) with words in the second file ( " Albert\n Randy\n Martin\n Ohio" ) . Whenever they're the same i need to put in the file word " Language " ; and print every word without representation in second file " Something like that: Language Language Bob

need's to be in my third file;

I tried to come up with some ideas , but they dont work; p ,

Thanks for every anwser in advance .

Upvotes: 0

Answers (2)

Brendan

Reputation: 37252

I'd open all three files to begin with (both input files and the output file). If you can't open all of them then you can't do anything useful (other than display an error message or something); and there's no point wasting CPU time only to find out that (for e.g.) you can't open the output file later. This can also help to reduce race conditions (e.g. second file changes while you're processing the first file).

Next, start processing the first file. Break it into words/tokens as you read it, and for each word/token calculate a hash value. Then use the hash value and the word/token itself to check if the new word/token is a duplicate of a previous (already known) word/token. If it's not a duplicate, allocate some memory and create a new entry for the word/token and insert the entry onto the linked list that corresponds to the hash.

Finally, process the second file. This is similar to how you processed the first file (break it into words/tokens, calculate the hash, use the hash to find out if the word/token is known), except if the word/token isn't known you write it to the output file, and if it is known you write " language" to the output file instead.

If you're not familiar with hash tables, they're fairly easy. For a simple method (not necessary the best method) of calculating the hash value for ASCII/text you could do something like:

hash = 0;
while(*src != 0) {
    hash = hash ^ (hash << 5) ^ *src;
    src++;
}
hash = hash % HASH_SIZE;

Then you have an array of linked lists, like "INDEX_ENTRY *index[HASH_SIZE]" that contains a pointer to the first entry for each linked list (or NULL if the linked list for the hash is empty).

To search, use the hash to find the first entry of the correct linked list then do "strcmp()" on each entry in the linked list. An example might look something like this:

INDEX_ENTRY *find_entry(uint32_t hash, char *new_word) {
    INDEX_ENTRY *entry;

    entry = index[hash];
    while(entry != NULL) {
        if(strcmp(new_word, entry->word) == 0) return entry;
        entry = entry->next;
    }
    return NULL;
}

The idea of all this is to improve performance. For example, if both files have 1024 words then (without a hash table) you'd need to do "strcmp()" 1024*1024 times; but if you use a hash table with "#define HASH_SIZE 1024" you'll probably reduce that to about 2000 times (and end up with much faster code). Larger values of HASH_SIZE increase the amount of memory you use a little (and reduce the chance of different words having the same hash).

Don't forget to close your files when you're finished with them. Freeing the memory you used is a good idea if you do something else after this (but if you don't do anything after this then it's faster and easier to "exit()" and let the OS cleanup).

Upvotes: 0

THE DOCTOR

Reputation: 4555

First, you need to open a stream to read the files.

If you need to do this in C, then you may use the strcmp function. It allows you to compares the two strings.

For example:

int strcmp(const char *s1, const char *s2);

Upvotes: 1

Comparing strings in two files in C;

Answers (2)

Related Questions