Scripted
Scripted

Reputation: 9

What is wrong with this sorting algorithm in C?

I was attempting a problem where I had to arrange a list of 5000 first names alphabetically (The names were stored in a text file "names.txt"). As seen from my code below, I created a 2D array names[n][m] to store the names.

For every single name, I compare it against all the other names alphabetically. Whenever the i-th name is alphabetically larger than another, there will be an increment to its alphabetical ranking stored in its array element rank[i]. For example, when "Mary" is compared to "Denise", Mary's rank will be incremented by 1 since it is alphabetically larger than Denise. All the ranks start from 1.

This seemed to work as it was successful when tested with the example provided in the question. However, the final answer I obtained was incorrect. More importantly, I discovered that several of the names shared the same ranking (i.e. I checked and found that "Columbus" and "Colt" both have the same ranking). I'm not sure why or where my algorithm is flawed, as it seems logically sound(?) to me. I tried making my code more readable by adding a few comments, and I would appreciate if someone could explain my mistake to me. I've only been coding for about a few days, so forgive me if I had committed any rookie mistakes. Thanks for your time!

Link to problem: https://projecteuler.net/problem=22

EDIT: The code is slightly truncated (I omitted the final step where I just added all the scores together). But the error I talked about is only relevant to the provided code. Thanks!

#include <stdio.h>
#include <string.h>
#include <math.h>
#include <stdlib.h>

int main() {
    FILE *fp;
    int i, j;
    int a = 0;
    int b = 0;
    fp = fopen("names.txt", "r");
    char names[5200][30] = { 0 };
    int rank[5200] = { 0 }; //Rank corresponds to their alphabetical positions
    unsigned long long score[5200] = { 0 };
    unsigned long long sum = 0;
    for (i = 0; i < 5200; i++) {
        (rank[i])++;  //All the rankings start from 1.
    }
    for (i = 0; !feof(fp); i++) {
        fscanf(fp, "\"%[A-Z]\",", &names[i]); //Scanning and storing the names from the file into the array.
    }

    for (i = 0; i < 5200; i++) {
        for (j = 0; j < 5200; j++) {
            if (i != j && names[i][0] != 0 && names[j][0] != 0) {
                while (names[i][a] == names[j][a]) {  //If the ith word and jth word have the same letter, then increment a (which advances to the next letter).
                    a++;
                }
                if (names[i][a] > names[j][a]) { 
                    (rank[i])++; //If the ith word has a larger letter than the jth word, there will be an increase in its rank.
                } else
                if (names[j][a] == 0 && names[i][a] != 0) { 
                    (rank[i])++; //If the jth word is shorter than the ith word, then i also increases its rank.
                }
            }
            a = 0;
        }
        for (a = 0; a < 30; a++) {
            if (names[i][a] != 0 && names[i][0] != 0) {
                score[i] += (names[i][a] - 64); //Sum up the alphabetical value (as per the question) for each name.
            }
        }
        score[i] = (rank[i]) * (score[i]);
    }

Upvotes: 0

Views: 83

Answers (1)

chqrlie
chqrlie

Reputation: 144550

Your algorithm works but there are a few implementation problems:

  • the test for (i = 0; !feof(fp); i++) { is incorrect. fscanf() may fail to convert the file contents before the end of file, causing an infinite loop. You should instead test if fscanf() returns 1 for success.
  • you should count the number of words read into the array and restrict the loops to this range.
  • you should not assume that names are not duplicated in the file. The loop while (names[i][a] == names[j][a]) { a++ } has undefined behavior if i and j have the same contents. Indeed using strcmp() to compare the names is simpler and safer.
  • there is no need to keep the ranks and scores of all names, you can compute the sum on the fly inside the outer loop. This saves the initialization code too.

Here is a corrected and simplified version:

#include <stdio.h>
#include <string.h>

int main() {
    char names[5200][30];
    FILE *fp;
    int i, j, n, a, rank, score;
    unsigned long long sum = 0;

    fp = fopen("p022_names.txt", "r");
    if (fp == NULL)
        return 1;

    // Scan and store the names from the file into the array.
    for (n = 0; n < 5200; n++) {
        if (fscanf(fp, "\"%29[A-Z]\",", names[n]) != 1)
            break;
    }
    fclose(fp);

    // n first names read from file.
    for (i = 0; i < n; i++) {
        rank = 1;
        for (j = 0; j < n; j++) {
            if (strcmp(names[i], names[j]) > 0)
                rank++;
        }
        score = 0;
        for (a = 0; names[i][a] != '\0'; a++) {
            // Sum up the alphabetical value (as per the question) for each name.
            score += names[i][a] - 'A' + 1;
        }
        sum += rank * score;
    }
    printf("Total score of %d names: %lld\n", n, sum);
    return 0;
}

Output:

Total score of 5163 names: 871198282

Upvotes: 2

Related Questions