Reputation: 1
I am writing a program to search through a very large text file with C++.
The file is a 21 million lines of code and is a backup of system file. I am trying to find the alarms that are stored inside of the code and print them out to a separate text file.
From comments below. I am unable to install any outside files or programs and it is being run on Windows Server 2012.
Currently my code works to find the first alarm string when I take a few thousand lines from the text file. But when I run the full 1GB plus text file it returns no results. It just skips over the results. I have tried allocating more memory and also an array and neither have seemed to work correctly (I could have coded it wrong I am not the best C++ coder and am learning as I go)
My question is why would it work on the smaller file, is it a memory problem? Do I need to store each line as a string as I go through then search that line, wouldn't that take much longer?
My code is as follows:
// Alarms.cpp : Defines the entry point for the console application.
#include "stdafx.h"
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
using namespace std;
int main(){
system("cls");
string PRIORITY_NAME, line;
//bool found = false;
ifstream myfile("fhx.txt");
ofstream alarmList("alarmlist.txt");
int counter = 0;
cout << "Searching for Alarms and sending to AlarmList.txt \n";
//make sure files are good and open and determine size
if (myfile.is_open() && alarmList.is_open())
{
cout << "File is open \n";
ifstream file("fhx.txt", ios::binary | ios::ate);
cout << "The current open file size is " << file.tellg() << " bytes \n";
system("pause");
}
else
{
cout << "File is not open \n";
system("pause");
}
cout << "Running \n"; // show program is running for user to see
// reads the file and searches while there is still a line
while (getline(myfile, line))
{
++counter;
cout << counter << "\n"; //print out lines scanned for debug purposes
// searches the file for PRIORITY_NAME
if (line.find("PRIORITY_NAME") != string::npos)
{
alarmList << line << "\n"; // [rint results to seperate text file
//getline(myfile, line);
cout << line << "\n";// print to console for debug
}
}
alarmList << "\n" << counter << " lines searched\n";
system("pause");
}
Here is the printout when I run the smaller under 2 thousand line file
PRIORITY_NAME="LOG"
PRIORITY_NAME="LOG"
PRIORITY_NAME="LOG"
PRIORITY_NAME="LOG"
PRIORITY_NAME="LOG"
PRIORITY_NAME="LOG"
PRIORITY_NAME="LOG"
PRIORITY_NAME="LOG"
PRIORITY_NAME="LOG"
PRIORITY_NAME="LOG"
PRIORITY_NAME="LOG"
PRIORITY_NAME="LOG"
PRIORITY_NAME="LOG"
PRIORITY_NAME="LOG"
PRIORITY_NAME="LOG"
PRIORITY_NAME="LOG"
PRIORITY_NAME="LOG"
PRIORITY_NAME="LOG"
1679 lines searched
Here is a snippet of the code I am searching, it is 21 million lines like this with the first alarm not until like 17,000. Unfortunately I can not give out much more of it than this:
OPERATOR_SUBSYSTEM
{
ENABLED=T
GLOBAL_ALARM_ACK_GROUP=1
RESTRICT_WRITES_TO_AREAS=T
AREA { NAME="AREA_A" }
AREA { NAME="K-401_SYS" }
AREA { NAME="UTIL_AUX" }
AREA { NAME="SIS" }
AREA { NAME="SIS_F201_MOD" }
AREA { NAME="SIS_COKER" }
AREA { NAME="SIS_VRU" }
AREA { NAME="SIS_F202_MOD" }
AREA { NAME="SIS_F203_MOD" }
AREA { NAME="SISCD201_2_SEQ" }
AREA { NAME="SISCD203_4_SEQ" }
AREA { NAME="SISCD205_6_SEQ" }
AREA { NAME="F-201_MOD" }
AREA { NAME="COKE_CUTTING" }
AREA { NAME="CRANE" }
AREA { NAME="FRACT_TWR" }
AREA { NAME="CD201_2_SEQ" }
AREA { NAME="ANTI_FOAM" }
AREA { NAME="MRX_COS" }
AREA { NAME="FIRE_GAS" }
AREA { NAME="ABS_STPR" }
AREA { NAME="BD_SYS" }
AREA { NAME="C3C4_SPLIT" }
AREA { NAME="CD203_4_SEQ" }
AREA { NAME="CD205_6_SEQ" }
AREA { NAME="DEBUT" }
AREA { NAME="DRUM_SEQ_OVW" }
AREA { NAME="F-202_MOD" }
AREA { NAME="F-203_MOD" }
AREA { NAME="FEED" }
AREA { NAME="NAPH_PRETREATER" }
AREA { NAME="S_E_SYS" }
AREA { NAME="T-403_AMINE" }
AREA { NAME="P203_204" }
}
REMOTE_OPERATION_NETWORK_SUBSYSTEM
{
ENABLED=F
COMMUNICATION_TYPE=SIMPLEX
TIMEOUT_INTERVAL=400
NETWORK_TYPE=REMOTE_NETWORK
ENCRYPTION=F
NTP_SERVER="0.0.0.0"
NTP_BACKUP="0.0.0.0"
}
TERMINAL_SERVER_SUBSYSTEM
{
ENABLED=T
}
VIRTUAL_SIS_NETWORK
{
}
ATTRIBUTE_INSTANCE NAME="ADVISE_ALM"
{
VALUE
{
PRIORITY_NAME="LOG"
ENAB=T
INV=F
ATYP="Change From Normal"
MONATTR=""
ALMATTR="ADVISE_ALM"
LIMATTR=""
PARAM1=""
PARAM2=""
SUPPTIMEOUT=1438560
MASK=65535
ISDEFAULTMASK=T
ALARM_FUNCTIONAL_CLASSIFICATION=0
}
EXPLICIT_OVERRIDE=T
VALUE_CHANGED=T
HAS_DEFAULT_VALUE=F
}
Any help is greatly appreciated. I am open to trying and learning anything. I was wondering if I need to use "vector" but am still reading about how to use it correctly.
Upvotes: 0
Views: 211
Reputation: 1495
// reads the file and searches while there is still a line
while (getline(myfile, line))
{
++counter;
cout << counter << "\n"; //print out lines scanned for debug purposes
// searches the file for PRIORITY_NAME
if (line.find("PRIORITY_NAME") != string::npos)
{
alarmList << line << "\n"; // [rint results to seperate text file
//getline(myfile, line);
cout << line << "\n";// print to console for debug
}
line.clear();
}
If in the above code, if all you want to do is find "PRIORITY_NAME" on a line by line basis, then you can clear the stream after you are done with each line. Just before the next iteration of the while loop, maybe a clear would help.
line.clear()
Upvotes: 0
Reputation: 36352
Allocating memory to read in the whole file just to find strings inside sounds like a very bad idea, and unnecessary. I'm pretty sure you also should be using neiter ios::ate
(starting at end of file instead of at beginning) nor binary
(it's a text file...).
I think this is a case of "you don't have to write this, it has already been done"; just use a tool like grep
, which should be available for virtually any operating system:
grep "PRIORITY_NAME" fhx.txt > alarmlist.txt
will do exactly what your program should do, would possibly be faster, and well-debugged.
Upvotes: 4