Reputation: 512
I have a problem. In my project, I take sentence line by line from dataset file which has one sentence each line. Then , I should separate sentences into words. But I couldn't find this how can I do.
This are the codes of class which will read from dataset:
class Input{
...
public:
string *word;
string *sentence;
Couple *couple; // int x , int y order of sentence and word
int number;
int line;
...
void readInput(string input);
}
This are the codes of read method:
void Input::readInput(string input)
{
cout << "Reading the " << input <<endl;
ifstream infile;
infile.open(input.c_str());
if(!infile.is_open()){
cerr << "Unable to open file: " << input << endl << endl;
exit(-1);
}
for(int i=0; i<line ; i++){
getline(infile, sentence[i]);
//infile >> sentence[i];
}
for(int j=0;j<line ;j++){
// I want to separate sentences into words
}
infile.close();
cout << "Finished Reading the " << input <<endl;
}
Upvotes: 0
Views: 3107
Reputation: 103693
for(int j=0; j<line; ++j)
{
std::istringstream iss(sentence[j]);
for (std::string w; iss >> w; )
{
word[number] = w;
++number;
}
}
You'll need to do something about punctuation though, if you don't want those attached to your words. Simple enough I think.
Upvotes: 4
Reputation: 1264
If it were me in the method where you have:
for(int j=0;j<line ;j++){
// I want to separate sentences into words
}
I would use a regex to match all words in sentence[j]
boost regex is a library I have used with great success in the past.
Upvotes: 1
Reputation: 843
You can try to loop through the std::string representing each line by looking for end-of-word markers using std::string::find_first_of(). The parameter to find_first_of would the set of characters that are used to separate words in your text file(could be space, period etc.).
Upvotes: 0