Reputation: 107
Hey so I'm having trouble figuring out the code to count the number of unique words. My thought process in terms of psudeocode was first making a vector so something like vector<string> unique_word_list;
Then I would get the program to read each line so I would have something likewhile(getline(fin,line))
. The hard part for me is coming up with the code where I check the vector(array) to see if the string is already in there. If it's in there I just increase the word count(simple enough) but if its not in there then I just add a new element to the vector. I would really appreciate if someone could help me out here. I feel like this is not hard but for some reason I can't think of the code for comparing the string with whats inside of the array and determining if its a unique word or not.
Upvotes: 1
Views: 1908
Reputation: 56557
Cannot help myself writing an answer that makes use of C++ beautiful library. I'd do it like this, with a std::set
:
#include <algorithm>
#include <cctype>
#include <string>
#include <set>
#include <fstream>
#include <iterator>
#include <iostream>
int main()
{
std::ifstream ifile("test.txt");
std::istream_iterator<std::string> it{ifile};
std::set<std::string> uniques;
std::transform(it, {}, std::inserter(uniques, uniques.begin()),
[](std::string str) // make it lower case, so case doesn't matter anymore
{
std::transform(str.begin(), str.end(), str.begin(), ::tolower);
return str;
});
// display the unique elements
for(auto&& elem: uniques)
std::cout << elem << " ";
// display the size:
std::cout << std::endl << uniques.size();
}
You can also define a new string type in which you change the char_traits
so the comparison becomes case-insensitive. This is the code you'd need (much more lengthy than before, but you may end up reusing it), the char_traits overload is copy/pasted from cppreference.com:
#include <algorithm>
#include <cctype>
#include <string>
#include <set>
#include <fstream>
#include <iterator>
#include <iostream>
struct ci_char_traits : public std::char_traits<char> {
static bool eq(char c1, char c2) { return toupper(c1) == toupper(c2); }
static bool ne(char c1, char c2) { return toupper(c1) != toupper(c2); }
static bool lt(char c1, char c2) { return toupper(c1) < toupper(c2); }
static int compare(const char* s1, const char* s2, size_t n) {
while ( n-- != 0 ) {
if ( toupper(*s1) < toupper(*s2) ) return -1;
if ( toupper(*s1) > toupper(*s2) ) return 1;
++s1; ++s2;
}
return 0;
}
static const char* find(const char* s, int n, char a) {
while ( n-- > 0 && toupper(*s) != toupper(a) ) {
++s;
}
return s;
}
};
using ci_string = std::basic_string<char, ci_char_traits>;
// need to overwrite the insertion and extraction operators,
// otherwise cannot use them with our new type
std::ostream& operator<<(std::ostream& os, const ci_string& str) {
return os.write(str.data(), str.size());
}
std::istream& operator>>(std::istream& os, ci_string& str) {
std::string tmp;
os >> tmp;
str.assign(tmp.data(), tmp.size());
return os;
}
int main()
{
std::ifstream ifile("test.txt");
std::istream_iterator<ci_string> it{ifile};
std::set<ci_string> uniques(it, {}); // that's it
// display the unique elements
for (auto && elem : uniques)
std::cout << elem << " ";
// display the size:
std::cout << std::endl << uniques.size();
}
Upvotes: 3
Reputation: 303047
Don't use a vector
- use a container that maintains uniqueness, like std::set
or std::unordered_set
. Just convert the string into lower case (using std::tolower
) before you add it:
std::set<std::string> words;
std::string next;
while (file >> next) {
std::transform(next.begin(), next.end(), next.begin(), std::tolower);
words.insert(next);
}
std::cout << "We have " << words.size() << " unique words.\n"
Upvotes: 6