Cavell219
Cavell219

Reputation: 1

C++ program missing searched string in large .TXT files. Works for smaller .TXT files

I am writing a program to search through a very large text file with C++.

The file is a 21 million lines of code and is a backup of system file. I am trying to find the alarms that are stored inside of the code and print them out to a separate text file.

From comments below. I am unable to install any outside files or programs and it is being run on Windows Server 2012.

Currently my code works to find the first alarm string when I take a few thousand lines from the text file. But when I run the full 1GB plus text file it returns no results. It just skips over the results. I have tried allocating more memory and also an array and neither have seemed to work correctly (I could have coded it wrong I am not the best C++ coder and am learning as I go)

My question is why would it work on the smaller file, is it a memory problem? Do I need to store each line as a string as I go through then search that line, wouldn't that take much longer?

My code is as follows:

    // Alarms.cpp : Defines the entry point for the console application.

#include "stdafx.h"
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>

using namespace std;

int main(){

    system("cls");
    string PRIORITY_NAME, line;
    //bool found = false;
    ifstream myfile("fhx.txt");
    ofstream alarmList("alarmlist.txt");
    int counter = 0;

    cout << "Searching for Alarms and sending to AlarmList.txt \n";

    //make sure files are good and open and determine size
    if (myfile.is_open() && alarmList.is_open())
    {
        cout << "File is open \n";
        ifstream file("fhx.txt", ios::binary | ios::ate);
        cout << "The current open file size is " << file.tellg() << " bytes \n";
        system("pause");
    }
    else
    {
        cout << "File is not open \n";
        system("pause");
    }

    cout << "Running \n"; // show program is running for user to see

    // reads the file and searches while there is still a line
    while (getline(myfile, line))
        {
            ++counter;
            cout << counter << "\n"; //print out lines scanned for debug purposes

        // searches the file for PRIORITY_NAME
            if (line.find("PRIORITY_NAME") != string::npos)
            {
                alarmList << line << "\n"; // [rint results to seperate text file
                //getline(myfile, line);
                cout << line << "\n";// print to console for debug
            }
        }
    alarmList << "\n" << counter << "  lines searched\n";
    system("pause");
}

Here is the printout when I run the smaller under 2 thousand line file

     PRIORITY_NAME="LOG"
  PRIORITY_NAME="LOG"
  PRIORITY_NAME="LOG"
  PRIORITY_NAME="LOG"
  PRIORITY_NAME="LOG"
  PRIORITY_NAME="LOG"
  PRIORITY_NAME="LOG"
  PRIORITY_NAME="LOG"
  PRIORITY_NAME="LOG"
  PRIORITY_NAME="LOG"
  PRIORITY_NAME="LOG"
  PRIORITY_NAME="LOG"
  PRIORITY_NAME="LOG"
  PRIORITY_NAME="LOG"
  PRIORITY_NAME="LOG"
  PRIORITY_NAME="LOG"
  PRIORITY_NAME="LOG"
  PRIORITY_NAME="LOG"

1679 lines searched

Here is a snippet of the code I am searching, it is 21 million lines like this with the first alarm not until like 17,000. Unfortunately I can not give out much more of it than this:

     OPERATOR_SUBSYSTEM
  {
    ENABLED=T
    GLOBAL_ALARM_ACK_GROUP=1
    RESTRICT_WRITES_TO_AREAS=T
    AREA { NAME="AREA_A" }
    AREA { NAME="K-401_SYS" }
    AREA { NAME="UTIL_AUX" }
    AREA { NAME="SIS" }
    AREA { NAME="SIS_F201_MOD" }
    AREA { NAME="SIS_COKER" }
    AREA { NAME="SIS_VRU" }
    AREA { NAME="SIS_F202_MOD" }
    AREA { NAME="SIS_F203_MOD" }
    AREA { NAME="SISCD201_2_SEQ" }
    AREA { NAME="SISCD203_4_SEQ" }
    AREA { NAME="SISCD205_6_SEQ" }
    AREA { NAME="F-201_MOD" }
    AREA { NAME="COKE_CUTTING" }
    AREA { NAME="CRANE" }
    AREA { NAME="FRACT_TWR" }
    AREA { NAME="CD201_2_SEQ" }
    AREA { NAME="ANTI_FOAM" }
    AREA { NAME="MRX_COS" }
    AREA { NAME="FIRE_GAS" }
    AREA { NAME="ABS_STPR" }
    AREA { NAME="BD_SYS" }
    AREA { NAME="C3C4_SPLIT" }
    AREA { NAME="CD203_4_SEQ" }
    AREA { NAME="CD205_6_SEQ" }
    AREA { NAME="DEBUT" }
    AREA { NAME="DRUM_SEQ_OVW" }
    AREA { NAME="F-202_MOD" }
    AREA { NAME="F-203_MOD" }
    AREA { NAME="FEED" }
    AREA { NAME="NAPH_PRETREATER" }
    AREA { NAME="S_E_SYS" }
    AREA { NAME="T-403_AMINE" }
    AREA { NAME="P203_204" }
  }
  REMOTE_OPERATION_NETWORK_SUBSYSTEM
  {
    ENABLED=F
    COMMUNICATION_TYPE=SIMPLEX
    TIMEOUT_INTERVAL=400
    NETWORK_TYPE=REMOTE_NETWORK
    ENCRYPTION=F
    NTP_SERVER="0.0.0.0"
    NTP_BACKUP="0.0.0.0"
  }
  TERMINAL_SERVER_SUBSYSTEM
  {
    ENABLED=T
  }
  VIRTUAL_SIS_NETWORK
  {
  }
  ATTRIBUTE_INSTANCE NAME="ADVISE_ALM"
  {
    VALUE
    {
      PRIORITY_NAME="LOG"
      ENAB=T
      INV=F
      ATYP="Change From Normal"
      MONATTR=""
      ALMATTR="ADVISE_ALM"
      LIMATTR=""
      PARAM1=""
      PARAM2=""
      SUPPTIMEOUT=1438560
      MASK=65535
      ISDEFAULTMASK=T
      ALARM_FUNCTIONAL_CLASSIFICATION=0
    }
    EXPLICIT_OVERRIDE=T
    VALUE_CHANGED=T
    HAS_DEFAULT_VALUE=F
  }

Any help is greatly appreciated. I am open to trying and learning anything. I was wondering if I need to use "vector" but am still reading about how to use it correctly.

Upvotes: 0

Views: 211

Answers (2)

basav
basav

Reputation: 1495

   // reads the file and searches while there is still a line
    while (getline(myfile, line))
        {
            ++counter;
            cout << counter << "\n"; //print out lines scanned for debug purposes

        // searches the file for PRIORITY_NAME
            if (line.find("PRIORITY_NAME") != string::npos)
            {
                alarmList << line << "\n"; // [rint results to seperate text file
                //getline(myfile, line);
                cout << line << "\n";// print to console for debug
            }
            line.clear();
        }

If in the above code, if all you want to do is find "PRIORITY_NAME" on a line by line basis, then you can clear the stream after you are done with each line. Just before the next iteration of the while loop, maybe a clear would help.

 line.clear()

Upvotes: 0

Marcus M&#252;ller
Marcus M&#252;ller

Reputation: 36352

Allocating memory to read in the whole file just to find strings inside sounds like a very bad idea, and unnecessary. I'm pretty sure you also should be using neiter ios::ate (starting at end of file instead of at beginning) nor binary (it's a text file...).

I think this is a case of "you don't have to write this, it has already been done"; just use a tool like grep, which should be available for virtually any operating system:

grep "PRIORITY_NAME" fhx.txt > alarmlist.txt

will do exactly what your program should do, would possibly be faster, and well-debugged.

Upvotes: 4

Related Questions