roxrook
roxrook

Reputation: 13853

How to find the longest word in a text document?


I'm looking for a way to find the longest word ( base on length ) in a text document using STL and boost. Here is my solution. However, it wasn't good at all, there were too many operations( token, sort .. ). Is there any simpler way to solve this problem?

// utility and memory
#include <utility>
#include <functional>
#include <memory>

#include <locale>
#include <string>

// data structure and algorithm
#include <stack>
#include <vector>
#include <list>
#include <set>
#include <map>
#include <deque>
#include <list>
#include <bitset>
#include <algorithm>
#include <iterator>
// numeric 
#include <complex>
#include <numeric>
#include <valarray>

// input/output
#include <iostream>
#include <iomanip>
#include <ios>
#include <iosfwd>
#include <streambuf>
#include <sstream>

// standard C
#include <cctype>
#include <cmath>
#include <climits>
#include <cstdlib>
#include <ctime>
#include <cassert>
#include <cstring>

// boost
#include <boost/tokenizer.hpp>

int main() {
    std::string str = "Test, test!, test...string";
    boost::char_separator<char> sep( ",!,.-" );
    boost::tokenizer<boost::char_separator<char> > tokens( str, sep );
    std::vector<std::string> res;
    std::copy( tokens.begin(), tokens.end(), std::back_inserter( res ) );
    std::sort( res.begin(), res.end(), []( const std::string& l, const std::string& r ) { return l.length() > r.length(); } );
    std::cout << "longest : " << *res.begin() << "\n";
    return 0;
} 

Best regards,
Chan

Upvotes: 2

Views: 2258

Answers (3)

maxim1000
maxim1000

Reputation: 6375

You can use std::max_element. Just give to it iterator pair and the comparator which you've already written.

Upvotes: 3

Morpheu5
Morpheu5

Reputation: 2801

That's one way, and it works, but there's one problem with it: it runs in O(n lg n), which is the complexity of the sorting algorithm.

An O(n) trivial algorithm would be to scan the tokens one by one and keep track of the longest word at each step. At the end of the loop you'll have one of the longest words in the set. I said one because there may be more than one words with the same length. This way you'd only capture the first you encounter.

You might want to modify the algorithm in order to keep track of all the longest words seen at each step.

Upvotes: 2

Neera
Neera

Reputation: 1587

Rather than storing and sorting the entire array of word tokens, you can just find the length of each token and compare it with the max_length seen so far. This will reduce your space requirements and also eliminate sorting computational complexity.

Upvotes: 1

Related Questions