Burak Dağlı
Burak Dağlı

Reputation: 512

how to separate sentence[i] (string) into words(string) in c++

I have a problem. In my project, I take sentence line by line from dataset file which has one sentence each line. Then , I should separate sentences into words. But I couldn't find this how can I do.

This are the codes of class which will read from dataset:

class Input{
...
public:
string *word;
string *sentence;
Couple *couple;    // int x , int y  order of sentence and word
int number;
int line;
...
void readInput(string input);
}

This are the codes of read method:

void Input::readInput(string input)
{
cout << "Reading the " << input <<endl;

ifstream infile;
infile.open(input.c_str());

    if(!infile.is_open()){
    cerr << "Unable to open file: " << input << endl << endl;
    exit(-1);
}

for(int i=0; i<line ; i++){
    getline(infile, sentence[i]);
    //infile >> sentence[i];
}

for(int j=0;j<line ;j++){
// I want to separate sentences into words
}    

infile.close();
cout << "Finished Reading the " << input <<endl;

}

Upvotes: 0

Views: 3107

Answers (3)

Benjamin Lindley
Benjamin Lindley

Reputation: 103693

for(int j=0; j<line; ++j)
{
    std::istringstream iss(sentence[j]);
    for (std::string w; iss >> w; )
    {
        word[number] = w;
        ++number;
    }
}

You'll need to do something about punctuation though, if you don't want those attached to your words. Simple enough I think.

Upvotes: 4

Rob Marrowstone
Rob Marrowstone

Reputation: 1264

If it were me in the method where you have:

for(int j=0;j<line ;j++){
    // I want to separate sentences into words
}

I would use a regex to match all words in sentence[j] boost regex is a library I have used with great success in the past.

Upvotes: 1

Specksynder
Specksynder

Reputation: 843

You can try to loop through the std::string representing each line by looking for end-of-word markers using std::string::find_first_of(). The parameter to find_first_of would the set of characters that are used to separate words in your text file(could be space, period etc.).

Upvotes: 0

Related Questions