Bryan Wong
Bryan Wong

Reputation: 633

How to extract formatted text in C++?

This might have appeared before, but I couldn't understand how to extract formatted data. Below is my code to extract all text between string "[87]" and "[90]" in a text file.

Apparently, the position of [87] and [90] is the same as indicated in the output.

void ExtractWebContent::filterContent(){
    string str, str1;
    string positionOfCurrency1 = "[87]";
    string positionOfCurrency2 = "[90]";
    size_t positionOfText1, positionOfText2;
    ifstream reading;
    reading.open("file_Currency.txt");
    while (!reading.eof()){ 
        getline (reading, str);

        positionOfText1 = str.find(positionOfCurrency1);
        positionOfText2 = str.find(positionOfCurrency2);
        cout << "positionOfCurrency1 " << positionOfText1 << endl;
        cout << "positionOfCurrency2 " << positionOfText2 << endl;

        //str1= str.substr (positionOfText);
        cout << "String" << str1 << endl;
    }

    reading.close(); 

An Update on the currency file:

[79]More »Brent slips to $102 on worries about euro zone economy

Market Data

 * Currencies

CAPTION: Currencies

      Name      Price    Change % Chg
   [80]USD/SGD
              1.2606     -0.00  -0.13%

                                       USD/SGD [81]USDSGD=X
   [82]EUR/SGD
              1.5242     0.00   +0.11%

                                       EUR/SGD [83]EURSGD=X

Upvotes: 1

Views: 1223

Answers (3)

Mike C
Mike C

Reputation: 1238

Boost.Tokenizer can be helpful for parsing out a string, but it gets a little trickier if those delimiters have to be bracketed numbers like you have them. With the delimieters as described, a regex is probably adequate.

Upvotes: 1

dimatura
dimatura

Reputation: 1250

All that does is concatenate the output of reading and the strings "[1]" and "[2]". I'm guessing this code resulted from a rather literal extrapolation of similar code using scanf. scanf (as well as the rest of C) still works in C++, so if that works for you I would use it.

That said, there are various levels of sophistication at which you can do this. Using regexes is one of the most powerful/flexible ways, but it might be overkill. The quickest way in my opinion is just to do something like:

  • Find index of substring "[1]", i1
  • Find index of substring "[2]", i2
  • get substring between i1+3 and i2.

In code, supposing std::string line has the text:

size_t i1 = line.find("[1]");
size_t i2 = line.find("[2]");
std::string out(line.substr(i1+3, i2));

Warning: no error checking.

Upvotes: 0

pmr
pmr

Reputation: 59811

That really depends on what 'extracting data means'. In simple cases you can just read the file into a string and then use string member functions (especially find and substr) to extract the segment you are interested in. If you are interested in data per line getline is the way to go for line extraction. Apply find and substr as before to get the segment.

Sometimes a simple find wont get you far and you will need a regular expression to do easily get to the parts you are interested in.

Often simple parsers evolve and soon outgrow even regular expressions. This often signals time for the very large hammer of C++ parsing Boost.Spirit.

Upvotes: 2

Related Questions