Reputation: 21
I hope someone can help me. I think it's an easy question, I want to write a program that searches words in a file.
char *such = "Ingo";
char *fund;
FILE *datei;
char text[100];
datei = fopen("names.txt", "r");
if (datei == NULL) {
printf("Fehler\n");
}
else
{
fscanf(datei, "%100c", text);
text[100] = '\0';
//i think this dont work
if (fgets(text, 100, datei) != NULL)
{
printf("%s \n", text);
}
}
return 0;
The file contains this:
Ingo Test Test 123 Test Ingo Ingo
Now I want to search how often the name "Ingo" is in the file.
It is possible to search for more words, maybe "ingo" and "test" and count this?
Upvotes: 1
Views: 4238
Reputation: 84551
There are a whole lot of conditions you should test for to insure you are only matching whole words, etc. The following is one approach to searching for jury
and only matching jury
, jury's
, but not injury
. You should also consider whether you want to match plurals for a word or not (e.g. review
and reviews
. Below a single collection of delimiters (delim
) is considered to insure you match whole words. You could easily break that into two and have a beginning and ending set if you wanted to match plurals or various other suffixes.
The code expects the filename to search as the first argument and the search term (sterm
) as the second. (if no arguments are given, it will search the text on stdin
for 'the'
). The code reads each line in the file into a temporary buffer called line
and then searches each character in line
for the beginning character in sterm
. If found, the previous character is checked to insure it is a delimiter and then the character following the word (by sterm
length) is also a delimiter. If it is a word that starts with the same character as sterm
, is delimited before and after, then the contents are compared using strncmp
.
If all conditions are satisfied, the word is copied to tmp
and the count
is incremented. The results are printed along with the zero-based position in line
for the match. This is just a basic whole-word search that has not been optimized, but should give you a starting place for discriminating whole words from lesser included substrings. (i.e. searching for 'the'
will not also match 'them'
, 'then'
, 'they'
, etc..). You can also turn this code into a function, that saves the line number and position of each match in an array of structs to which you can return a pointer. That way you can parse your text and return a pointer to an array that holds the line and position of each match. (that's for another day).
Look over the code and let me know if you have questions. If you are not concerned with only matching whole-words, then you could simply call strstr
repeatedly on each line while advancing a pointer to count the occurrences of the search term. Whatever best meets your needs.
#include <stdio.h>
#include <string.h>
#define MAXS 256
int main (int argc, char **argv)
{
char line[MAXS] = {0}; /* line buffer for fgets */
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
char *sterm = argc > 2 ? argv[2] : "the";
char *delim = " \t\n\'\".";
size_t count = 0, idx = 0, slen = strlen (sterm);
if (!fp) {
fprintf (stderr, "error: file open failed '%s'\n", argv[1]);
return 1;
}
while (fgets (line, MAXS, fp))
{
size_t i, llen = strlen (line);
idx++;
if (llen < slen + 1)
continue; /* line not longer than search term + \n */
for (i = 0; i < llen - slen + 1; i++) {
if (line[i] != *sterm)
continue; /* char != first char in sterm */
if (i && !strchr (delim, line[i-1]))
continue; /* prior char is not a delim */
if (!strchr (delim, line[i+slen]))
continue; /* next char is not a delim */
if (strncmp (&line[i], sterm, slen))
continue; /* chars don't match sterm */
printf (" line[%2zu] match %2zu. '%s' at location %zu\n",
idx, ++count, sterm, &line[i] - line);
}
}
if (fp != stdin) fclose (fp);
printf ("\n total occurrences of '%s' in '%s' : %zu\n\n",
sterm, argc > 1 ? argv[1] : "stdin", count);
return 0;
}
Sample File
$ cat dat/damages.txt
Personal injury damage awards are unliquidated
and are not capable of certain measurement; thus, the
jury has broad discretion in assessing the amount of
damages in a personal injury case. Yet, at the same
time, a factual sufficiency review insures that the
evidence supports the jury's award; and, although
difficult, the law requires appellate courts to conduct
factual sufficiency reviews on damage awards in
personal injury cases. Thus, while a jury has latitude in
assessing intangible damages in personal injury cases,
a jury's damage award does not escape the scrutiny of
appellate review.
Because Texas law applies no physical manifestation
rule to restrict wrongful death recoveries, a
trial court in a death case is prudent when it chooses
to submit the issues of mental anguish and loss of
society and companionship. While there is a
presumption of mental anguish for the wrongful death
beneficiary, the Texas Supreme Court has not indicated
that reviewing courts should presume that the mental
anguish is sufficient to support a large award. Testimony
that proves the beneficiary suffered severe mental
anguish or severe grief should be a significant and
sometimes determining factor in a factual sufficiency
analysis of large non-pecuniary damage awards.
Output
$ ./bin/searchterm dat/damages.txt jury
line[ 3] match 1. 'jury' at location 0
line[ 6] match 2. 'jury' at location 22
line[ 9] match 3. 'jury' at location 37
line[11] match 4. 'jury' at location 2
total occurrences of 'jury' in 'dat/damages.txt' : 4
or
$ ./bin/searchterm <dat/damages.txt
line[ 2] match 1. 'the' at location 50
line[ 3] match 2. 'the' at location 39
line[ 4] match 3. 'the' at location 43
line[ 5] match 4. 'the' at location 48
line[ 6] match 5. 'the' at location 18
line[ 7] match 6. 'the' at location 11
line[11] match 7. 'the' at location 38
line[17] match 8. 'the' at location 10
line[19] match 9. 'the' at location 34
line[20] match 10. 'the' at location 13
line[21] match 11. 'the' at location 42
line[23] match 12. 'the' at location 12
total occurrences of 'the' in 'stdin' : 12
Using a pointer instead of array index notation
You may find it a bit more natural to use a pointer instead of array index notation. (e.g. using char *p = line;
and advancing p
, instead of using line[X]
notation). If so, you can replace the read loop with the following:
while (fgets (line, MAXS, fp))
{
char *p = line;
size_t llen = strlen (line);
idx++;
if (llen < slen + 1)
continue; /* line not longer than search term + \n */
for (;p < (line + llen - slen + 1); p++) {
if (*p != *sterm)
continue; /* char != first char in sterm */
if (p > line && !strchr (delim, *(p - 1)))
continue; /* prior char is not a delim */
if (!strchr (delim, *(p + slen)))
continue; /* next char is not a delim */
if (strncmp (p, sterm, slen))
continue; /* chars don't match sterm */
printf (" line[%2zu] match %2zu. '%s' at location %zu\n",
idx, ++count, sterm, p - line);
}
}
The pointer notation is probably a bit more natural in C. Let me know if you have any questions.
Upvotes: 3
Reputation: 40145
#include <stdio.h>
#include <string.h>
#include <ctype.h>
int main(void) {
char *such = "Ingo";
FILE *datei;
char word[100];
int counter = 0;
datei = fopen("names.txt", "r");
if (datei == NULL) {
printf("Fehler\n");
}
else
{
while(1==fscanf(datei, "%99s", word)){//read word by word
word[0] = toupper(word[0]); //ingo --> Ingo
if (strcmp(word, such) == 0){
++counter;
}
}
fclose(datei);
if (counter != 0){
printf("number of '%s' is %d\n", such, counter);
}
}
return 0;
}
Upvotes: 2
Reputation: 1997
There are two very simple ways to accomplish this:
In loop you use fscanf to find words from file until you reach EOF, and in meantime ask whether that word is what you're looking for with strcmp (string compare) from string.h
Use two loops, in outer loop with fgetc get chars until you reach some delimiter such as space or \n or \t, and in inner loop check if that word you scanned with getc is word you're looking for. You'll need some temporarly char array for this.
Upvotes: 1