Reputation: 11
I am looking to find the number of words that start with d, D, or any other character within a file. Currently I am having trouble counting each instance of a new word. For example, if there are 5 Davids and 3 Dogs within the file, I would want to count each of them individually.
I would prefer something that would not require massive change. Any help is appreciated.
#include<iostream>
#include<fstream> //needed for file opening and closing/manipulation within files
#include<vector> //needed for vectors to store the words from the file
#include<algorithm> //needed for sort algorithm later
using namespace std;
int main(){
string inputName, num, words;
cout<<"Enter a valid filename: "; //Prompting user for a file name in the directory of this program exe
cin>>inputName;
ifstream file(inputName); //Creating a ifstream File which will open the file to the program
vector<string> dWords; //Creating 2 vectors, 1 for anything that starts with 'd'/'D' and 2 for anything else
vector<string> otherWords;
while(!file.eof()){ //While loop that runs until the file is eof or end of file.
getline(file, words);
while(file>>words){ //Reading each line and extracting into the words variable
if(words[0]=='d'||words[0]=='D'){ //if statement that checks if the first letter in each word starts with a 'd' or 'D'
dWords.push_back(words); //if true then the word gets added to the vector with the push_back
}
else if(words[0]=='"'){ //Checking for a niche case of when a word starts with a "
if(words[1]=='d'||words[0]=='D'){//If true then the same if statement will happen to check for 'd' or 'D'
dWords.push_back(words);
}
}
else{ //This case is for everything not mentioned already
otherWords.push_back(words); //This is added to a different vector than the dWords
}
}
}
dWords.erase(unique(dWords.begin(), dWords.end()));
otherWords.erase(unique(otherWords.begin(), otherWords.end()));
sort(dWords.begin(), dWords.end()); //Using the C++ native sorting method that works with vectors to sort alphabetically
sort(otherWords.begin(), otherWords.end());
cout<<"All words starting with D or d in the file: "<<endl; //printing out the words that start with 'd' or 'D' alphabetically
for(int a=0; a<=dWords.size(); a++){
cout<<dWords[a]<<endl;
}
cout<<endl;
cout<<"All words not starting with D or d in the file: "<<endl; //printing out every other word/character left
for(int b=0; b<=otherWords.size(); b++){
cout<<otherWords[b]<<endl;
}
file.close(); //closing file after everything is done in program
}
Upvotes: 1
Views: 390
Reputation: 35440
Here is a version that illustrates what I mentioned in the main comments. This code doesn't need an extra vector to store the words that start with D
.
#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <cctype>
#include <fstream>
int main()
{
std::string words;
std::vector<std::string> dWords;
std::string inputName;
std::cin >> inputName;
ifstream file(inputName);
while(file >> words)
{
// remove punctuation
words.erase(std::remove_if(words.begin(), words.end(), [](char ch)
{ return ::ispunct(static_cast<int>(ch)); }), words.end());
dWords.push_back(words);
}
// partition D from non-D words
auto iter = std::partition(dWords.begin(), dWords.end(), [](const std::string& s)
{ return toupper(s[0]) == 'D'; });
// output results
std::cout << "The number of words starting with D: " << std::distance(dWords.begin(), iter) << "\n";
std::cout << "Here are the words:\n";
std::copy(dWords.begin(), iter, std::ostream_iterator<std::string>(std::cout, " "));
std::cout << "\n\nThe number of words not starting with D: " << std::distance(iter, dWords.end()) << "\n";
std::cout << "Here are the words:\n";
std::copy(iter, dWords.end(), std::ostream_iterator<std::string>(std::cout, " "));
}
This is essentially a program that is about 4 lines.
1) A read of the word,
2) a filtering of the word to remove the punctuation,
3) partitioning the vector,
4) getting the count by using the partition.
Here are the changes:
while(file >> words)
The loop to read in each word is simplified. All that was necessary was to use the >>
to read each word in a loop.
Remove the punctuation from each word using remove_if
and the ispunct
lambda. This removes commas, quotes, and other symbols from the word. When this is done, there is no need to check for "
later on in your test for double quotes.
words.erase(std::remove_if(words.begin(), words.end(), [](char ch)
{ return ::ispunct(static_cast<int>(ch)); }), words.end());
dWords.push_back(words);
We push all the words onto the vector. It doesn't matter if the word starts with D
or not. We will take care of that later.
Separate the words that start with D
from the words that do not start with D
.
This is done by using the std::partition algorithm function. This function places items that match a certain criteria on the left side of the partition, and the items that do not match on the right side of the partition. An iterator is returned, denoting where the partition point is.
In this case, the criteria is "all words that start with D
or d
-- if this is true for a character, it is placed on the left of the partition. Note the use of toupper
to test both d
and D
.
// partition D from non-D words
auto iter = std::partition(dWords.begin(), dWords.end(), [](const std::string& s)
{ return toupper(s[0]) == 'D'; });
Get the count of the number of items on the left and right partition.
Since all the items on the left of the partition start with D
, then it's just a matter of getting the distance from the beginning of the vector up to the partition point iter
to get a count of the items.
Likewise, to get a count of the words not starting with D
, we count the characters from the partition point iter
to the end of the vector:
To get the number of items we can use the std::distance algorithm function:
// output results
std::cout << "The number of words starting with D: " << std::distance(dWords.begin(), iter) << "\n";
std::cout << "Here are the words:\n";
std::copy(dWords.begin(), iter, std::ostream_iterator<std::string>(std::cout, " "));
std::cout << "\n\nThe number of words not starting with D: " << std::distance(iter, dWords.end()) << "\n";
std::cout << "Here are the words:\n";
std::copy(iter, dWords.end(), std::ostream_iterator<std::string>(std::cout, " "));
The std::copy
is just a fancy way of outputting the contents of the vector without writing a loop, so don't let that distract you.
Here is a live example. The only difference is that cin
is used instead of a file.
If you really wanted to separate the vector into two distinct vectors, one with D
words and one without, then it is as simple as creating the vectors from the partitioned vector:
std::vector<std::string> onlyDwords(dWords.begin(), iter);
std::vector<std::string> nonDWords(iter, dWords.end());
Upvotes: 2
Reputation: 84551
Avoiding std::vector
altogether and using std::map provides a succinct way to maps strings beginning with any character to the frequency that words beginning with that character occur in a given block of text.
std::map<std::string, size_t>
provides a way to map unique strings to the number of times they occurs. The std::string
is used as the unique key and the size_t
count is used as the value. Since the strings in the map will be unique, you only need to read each word, check if the word begins with the character to find, and then:
mymap[word]++;
After you are done reading words, mymap
will hold the frequency that words added to the map occur. Reading from a file, using the map name wordfreq
, you could do:
#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
#include <cctype>
#include <map>
int main (int argc, char **argv) {
/* filename as 1st argument or use "default.txt" by default */
const char *fname = argc > 1 ? argv[1] : "default.txt"; /* filename */
const char c2find = argc > 2 ? tolower(*argv[2]) : 'd'; /* 1st char to find */
std::map<std::string, size_t> wordfreq{};
std::string word; /* string to hold each word */
std::ifstream f (fname); /* open ifstream using fname */
if (!f.is_open()) { /* validate file open for reading */
std::cerr << "error: file open failed '" << fname << "'.\n"
<< "usage: " << argv[0] << " [filename (default.txt)]\n";
return 1;
}
while (f >> word) { /* read each whitespace separate word */
if (tolower(word[0]) == c2find) { /* if word begins with char to find */
wordfreq[word]++; /* increment frequency of word in map */
}
}
for (const auto& w : wordfreq)
std::cout << std::left << std::setw(16) << w.first <<
std::right << w.second << '\n';
}
Example Input File
$ cat default.txt
All work and no play makes David a dull boy!
All work and no play makes David a dull boy!
All work and no play makes David a dull boy!
All work and no play makes David a dull boy!
All work and no play makes David a dull boy!
Example Use/Output
$ ./bin/map_word_freq
David 5
dull 5
or for 'a'
:
$./bin/map_word_freq default.txt a
All 5
a 5
and 5
(note: if you want to provide a different character (which is the 2nd argument to the program), you have to provide the filename to read before it)
Look things over and let me know if you have further questions.
Upvotes: 1
Reputation: 1504
In your code you use std::unique
to reduce adjacent duplicate words to 1 inside your vectors.
In the body of the question, you state that you'd prefer to count each and every word, so in my version of the code below, I also left copies of the original vectors, and a count summary at the end.
I have also corrected words[1]=='d'||words[0]=='D'
to two 1
, as pointed out in the comments section, and tweaked other aspects of the original code (std::vector::erase
needs a second iterator as argument):
#include<iostream>
#include<fstream> //needed for file opening and closing/manipulation within files
#include<vector> //needed for vectors to store the words from the file
#include<algorithm> //needed for sort algorithm later
using namespace std;
int main(){
string inputName, num, words;
cout<<"Enter a valid filename: "; //Prompting user for a file name in the directory of this program exe
cin>>inputName;
ifstream file(inputName); //Creating a ifstream File which will open the file to the program
vector<string> dWords; //Creating 2 vectors, 1 for anything that starts with 'd'/'D' and 2 for anything else
vector<string> otherWords;
while(!file.eof()){ //While loop that runs until the file is eof or end of file.
getline(file, words);
while(file>>words){ //Reading each line and extracting into the words variable
if(words[0]=='d'||words[0]=='D'){ //if statement that checks if the first letter in each word starts with a 'd' or 'D'
dWords.push_back(words); //if true then the word gets added to the vector with the push_back
}
else if(words[0]=='"'){ //Checking for a niche case of when a word starts with a "
if(words[1]=='d'||words[1]=='D'){//If true then the same if statement will happen to check for 'd' or 'D' --- corrected second condition, from words[0]=='D'
dWords.push_back(words);
}
}
else{ //This case is for everything not mentioned already
otherWords.push_back(words); //This is added to a different vector than the dWords
}
}
}
// I have added 2 copies of the vectors of strings, in case you intend to count each single word, without reducing adjacent duplicates to 1 with std::unique
vector<string> original_dWords(dWords);
vector<string> original_otherWords(otherWords);
dWords.erase(unique(dWords.begin(), dWords.end()), dWords.end());
otherWords.erase(unique(otherWords.begin(), otherWords.end()), otherWords.end());
sort(dWords.begin(), dWords.end()); //Using the C++ native sorting method that works with vectors to sort alphabetically
sort(otherWords.begin(), otherWords.end());
cout<<"All words starting with D or d in the file: "<<endl; //printing out the words that start with 'd' or 'D' alphabetically
for(unsigned a=0; a<dWords.size(); a++){
cout<<dWords[a]<<endl;
}
cout<<endl;
cout<<"All words not starting with D or d in the file: "<<endl; //printing out every other word/character left
for(unsigned b=0; b<otherWords.size(); b++){
cout<<otherWords[b]<<endl;
}
// added a words count summary
cout << "Number of words beginning with d,D is: " << original_dWords.size() << endl;
cout << "If we leave just one out of consecutive, identical words, that number falls to: " << dWords.size() << endl;
cout << "Number of words not beginning with d,D is: " << original_otherWords.size() << endl;
cout << "If we leave just one out of consecutive, identical words, that number falls to: " << otherWords.size() << endl;
file.close(); //closing file after everything is done in program
}
Upvotes: 0