holger94
holger94

Reputation: 21

C search words in a String

I hope someone can help me. I think it's an easy question, I want to write a program that searches words in a file.

char *such = "Ingo";
char *fund;
FILE *datei;
char text[100];

datei = fopen("names.txt", "r");

if (datei == NULL) {
    printf("Fehler\n");
}
else 
{
    fscanf(datei, "%100c", text);
    text[100] = '\0';
    //i think this dont work
    if (fgets(text, 100, datei) != NULL)
    {
        printf("%s \n", text);
    }   
}

return 0;

The file contains this:

Ingo Test Test 123 Test Ingo Ingo

Now I want to search how often the name "Ingo" is in the file.

It is possible to search for more words, maybe "ingo" and "test" and count this?

Upvotes: 1

Views: 4238

Answers (3)

David C. Rankin
David C. Rankin

Reputation: 84551

There are a whole lot of conditions you should test for to insure you are only matching whole words, etc. The following is one approach to searching for jury and only matching jury, jury's, but not injury. You should also consider whether you want to match plurals for a word or not (e.g. review and reviews. Below a single collection of delimiters (delim) is considered to insure you match whole words. You could easily break that into two and have a beginning and ending set if you wanted to match plurals or various other suffixes.

The code expects the filename to search as the first argument and the search term (sterm) as the second. (if no arguments are given, it will search the text on stdin for 'the'). The code reads each line in the file into a temporary buffer called line and then searches each character in line for the beginning character in sterm. If found, the previous character is checked to insure it is a delimiter and then the character following the word (by sterm length) is also a delimiter. If it is a word that starts with the same character as sterm, is delimited before and after, then the contents are compared using strncmp.

If all conditions are satisfied, the word is copied to tmp and the count is incremented. The results are printed along with the zero-based position in line for the match. This is just a basic whole-word search that has not been optimized, but should give you a starting place for discriminating whole words from lesser included substrings. (i.e. searching for 'the' will not also match 'them', 'then', 'they', etc..). You can also turn this code into a function, that saves the line number and position of each match in an array of structs to which you can return a pointer. That way you can parse your text and return a pointer to an array that holds the line and position of each match. (that's for another day).

Look over the code and let me know if you have questions. If you are not concerned with only matching whole-words, then you could simply call strstr repeatedly on each line while advancing a pointer to count the occurrences of the search term. Whatever best meets your needs.

#include <stdio.h>
#include <string.h>

#define MAXS 256

int main (int argc, char **argv)
{
    char line[MAXS] = {0};  /* line buffer for fgets */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
    char *sterm = argc > 2 ? argv[2] : "the";
    char *delim = " \t\n\'\".";
    size_t count = 0, idx = 0, slen = strlen (sterm);

    if (!fp) {
        fprintf (stderr, "error: file open failed '%s'\n", argv[1]);
        return 1;
    }

    while (fgets (line, MAXS, fp))
    {
        size_t i, llen = strlen (line);
        idx++;

        if (llen < slen + 1)
            continue;       /* line not longer than search term + \n */

        for (i = 0; i < llen - slen + 1; i++) {

            if (line[i] != *sterm)
                continue;   /* char != first char in sterm  */
            if (i && !strchr (delim, line[i-1]))
                continue;   /* prior char is not a delim    */
            if (!strchr (delim, line[i+slen]))
                continue;   /* next char is not a delim     */
            if (strncmp (&line[i], sterm, slen))
                continue;   /* chars don't match sterm      */

            printf (" line[%2zu] match %2zu. '%s' at location %zu\n",
                    idx, ++count, sterm, &line[i] - line);
        }
    }
    if (fp != stdin) fclose (fp);

    printf ("\n total occurrences of '%s' in '%s' : %zu\n\n",
            sterm, argc > 1 ? argv[1] : "stdin", count);

    return 0;
}

Sample File

$ cat dat/damages.txt
Personal injury damage awards are unliquidated
and are not capable of certain measurement; thus, the
jury has broad discretion in assessing the amount of
damages in a personal injury case. Yet, at the same
time, a factual sufficiency review insures that the
evidence supports the jury's award; and, although
difficult, the law requires appellate courts to conduct
factual sufficiency reviews on damage awards in
personal injury cases. Thus, while a jury has latitude in
assessing intangible damages in personal injury cases,
a jury's damage award does not escape the scrutiny of
appellate review.

Because Texas law applies no physical manifestation
rule to restrict wrongful death recoveries, a
trial court in a death case is prudent when it chooses
to submit the issues of mental anguish and loss of
society and companionship. While there is a
presumption of mental anguish for the wrongful death
beneficiary, the Texas Supreme Court has not indicated
that reviewing courts should presume that the mental
anguish is sufficient to support a large award. Testimony
that proves the beneficiary suffered severe mental
anguish or severe grief should be a significant and
sometimes determining factor in a factual sufficiency
analysis of large non-pecuniary damage awards.

Output

$ ./bin/searchterm dat/damages.txt jury
 line[ 3] match  1. 'jury' at location 0
 line[ 6] match  2. 'jury' at location 22
 line[ 9] match  3. 'jury' at location 37
 line[11] match  4. 'jury' at location 2

 total occurrences of 'jury' in 'dat/damages.txt' : 4

or

$ ./bin/searchterm <dat/damages.txt
 line[ 2] match  1. 'the' at location 50
 line[ 3] match  2. 'the' at location 39
 line[ 4] match  3. 'the' at location 43
 line[ 5] match  4. 'the' at location 48
 line[ 6] match  5. 'the' at location 18
 line[ 7] match  6. 'the' at location 11
 line[11] match  7. 'the' at location 38
 line[17] match  8. 'the' at location 10
 line[19] match  9. 'the' at location 34
 line[20] match 10. 'the' at location 13
 line[21] match 11. 'the' at location 42
 line[23] match 12. 'the' at location 12

 total occurrences of 'the' in 'stdin' : 12

Using a pointer instead of array index notation

You may find it a bit more natural to use a pointer instead of array index notation. (e.g. using char *p = line; and advancing p, instead of using line[X] notation). If so, you can replace the read loop with the following:

    while (fgets (line, MAXS, fp))
    {
        char *p = line;
        size_t llen = strlen (line);
        idx++;

        if (llen < slen + 1)
            continue;       /* line not longer than search term + \n */

        for (;p < (line + llen - slen + 1); p++) {

            if (*p != *sterm)
                continue;   /* char != first char in sterm  */
            if (p > line && !strchr (delim, *(p - 1)))
                continue;   /* prior char is not a delim    */
            if (!strchr (delim, *(p + slen)))
                continue;   /* next char is not a delim     */
            if (strncmp (p, sterm, slen))
                continue;   /* chars don't match sterm      */

            printf (" line[%2zu] match %2zu. '%s' at location %zu\n",
                    idx, ++count, sterm, p - line);
        }
    }

The pointer notation is probably a bit more natural in C. Let me know if you have any questions.

Upvotes: 3

BLUEPIXY
BLUEPIXY

Reputation: 40145

#include <stdio.h>
#include <string.h>
#include <ctype.h>

int main(void) {
    char *such = "Ingo";
    FILE *datei;
    char word[100];
    int counter = 0;

    datei = fopen("names.txt", "r");

    if (datei == NULL) {
        printf("Fehler\n");
    }
    else 
    {
        while(1==fscanf(datei, "%99s", word)){//read word by word
            word[0] = toupper(word[0]);       //ingo --> Ingo
            if (strcmp(word, such) == 0){
                ++counter;
            }
        }
        fclose(datei);
        if (counter != 0){
            printf("number of '%s' is %d\n", such, counter);
        }   

    }

    return 0;
}

Upvotes: 2

Aleksandar Makragić
Aleksandar Makragić

Reputation: 1997

There are two very simple ways to accomplish this:

  1. In loop you use fscanf to find words from file until you reach EOF, and in meantime ask whether that word is what you're looking for with strcmp (string compare) from string.h

  2. Use two loops, in outer loop with fgetc get chars until you reach some delimiter such as space or \n or \t, and in inner loop check if that word you scanned with getc is word you're looking for. You'll need some temporarly char array for this.

Upvotes: 1

Related Questions