Reputation: 19
I am trying to write a program which opens up a text file, reads from the file, changes upper case to lower case, and then counts how many times that word has occurred in the file and prints results into a new text file.
My code so far is as follows:
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#include <ctype.h>
#include <string.h>
int main()
{
FILE *fileIN;
FILE *fileOUT;
char str[255];
char c;
int i = 0;
fileIN = fopen ("input.txt", "r");
fileOUT = fopen ("output.txt", "w");
if (fileIN == NULL || fileOUT == NULL)
{
printf("Error opening files\n");
}
else
{
while(! feof(fileIN)) //reading and writing loop
{
fscanf(fileIN, "%s", str); //reading file
i = 0;
c = str[i];
if (isupper(c)) //changing any upper case to lower case
{
c =(tolower(c));
str[i] = putchar(c);
}
printf("%s ", str); //printing output
fprintf(fileOUT, "%s\n", str); //printing into file
}
fclose(fileIN);
fclose(fileOUT);
}
getch();
}
the input.txt file contains the following "The rain in Spain falls mainly in the plane" Don't ask why. After the running of the program as is the output would look like: the rain in spain falls mainly in the plane
I have managed to lower case the upper case words. I am now having trouble understanding how I would count the occurrences of each word. eg in the output I would want it to say "the 2" meaning 2 had appeared, this would also mean that i do not want any more "the" to be stored in that file.
I am thinking strcmp and strcpy but unsure how to use those the way i want.
Help would be much appreciated
(Sorry if formatting bad)
Upvotes: 0
Views: 2116
Reputation: 40145
easy sample(need error catch, do free memory, sorting for use qsort, etc...)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define BUFFSIZE 1024
typedef struct _wc {
char *word;
int count;
} WordCounter;
WordCounter *WordCounters = NULL;
int WordCounters_size = 0;
void WordCount(char *word){
static int size = 0;
WordCounter *p=NULL;
int i;
if(NULL==WordCounters){
size = 4;
WordCounters = (WordCounter*)calloc(size, sizeof(WordCounter));
}
for(i=0;i<WordCounters_size;++i){
if(0==strcmp(WordCounters[i].word, word)){
p=WordCounters + i;
break;
}
}
if(p){
p->count += 1;
} else {
if(WordCounters_size == size){
size += 4;
WordCounters = (WordCounter*)realloc(WordCounters, sizeof(WordCounter)*size);
}
if(WordCounters_size < size){
p = WordCounters + WordCounters_size++;
p->word = strdup(word);
p->count = 1;
}
}
}
int main(void){
char buff[BUFFSIZE];
char *wordp;
int i;
while(fgets(buff, BUFFSIZE, stdin)){
strlwr(buff);
for(wordp=buff; NULL!=(wordp=strtok(wordp, ".,!?\"'#$%&()=@ \t\n\\;:[]/*-+<>"));wordp=NULL){
if(!isdigit(*wordp) && isalpha(*wordp)){
WordCount(wordp);
}
}
}
for(i=0;i<WordCounters_size;++i){
printf("%s:%d\n", WordCounters[i].word, WordCounters[i].count);
}
return 0;
}
demo
>WordCount.exe
The rain in Spain falls mainly in the plane
^Z
the:2
rain:1
in:2
spain:1
falls:1
mainly:1
plane:1
Upvotes: 0
Reputation: 20383
You may want to create a hash table with the words as keys and frequencies as values.
Sketch ideas:
At the end, print the contents of the dictionary, i.e. for all entries, entry.word
and entry.frequency
See this question and answer for details: Quick Way to Implement Dictionary in C It is based on Section 6.6 of the bible "The C Programming Language"
UPDATE based on OP's comment:
Hash table is just an efficient table, if you do not want to use it, you can still use vanilla tables. Here are some ideas.
typedef struct WordFreq {
char word[ N ];
int freq;
} WordFreq;
WordFreq wordFreqTable[ T ];
(N is the maximum length of a single word, T is the maximum number of unique words)
For searching and inserting, you can do a linear search in the table for( int i = 0; i != T; ++i ) {
Upvotes: 1