Reputation: 171
I'm attempting to go through a simple text file with assembly instruction in it and it looks like this
TOP NOP
VAL INT 0
TAN LA 2,1
This is just a small example so I can show you how it works. Basically I'm taking the first labels and placing them in label, then the second which are NOP, INT, and LA and placing them in opcode.
After that I'm taking the first argument (0 and 2) and placing them in arg1. However here is where my problem comes in. With the current code I have, the output I get when I place the arguments into the string is as such
TOP
0
2
Obviously I'd like to only get the last two to be the only ones but how do I make it so that TOP doesn't get thrown in there with my first arguments?
#include <string>
#include <iostream>
#include <cstdlib>
#include <string.h>
#include <fstream>
#include <stdio.h>
using namespace std;
int main(int argc, char *argv[])
{
// If no extra file is provided then exit the program with error message
if (argc <= 1)
{
cout << "Correct Usage: " << argv[0] << " <Filename>" << endl;
exit (1);
}
// Array to hold the registers and initialize them all to zero
int registers [] = {0,0,0,0,0,0,0,0};
string memory [16000];
string Symtablelab[1000];
int Symtablepos[1000];
string line;
string label;
string opcode;
string arg1;
string arg2;
// Open the file that was input on the command line
ifstream myFile;
myFile.open(argv[1]);
if (!myFile.is_open())
{
cerr << "Cannot open the file." << endl;
}
int counter = 0;
int i = 0;
int j = 0;
while (getline(myFile, line, '\n'))
{
if (line[0] == '#')
{
continue;
}
if (line.length() == 0)
{
continue;
}
if (line[0] != '\t' && line[0] != ' ')
{
string delimeters = "\t ";
int current;
int next = -1;
current = next + 1;
next = line.find_first_of( delimeters, current);
label = line.substr( current, next - current );
Symtablelab[i] = label;
current = next + 1;
next = line.find_first_of(delimeters, current);
opcode = line.substr(current, next - current);
if (opcode != "WORDS" && opcode != "INT")
{
counter += 3;
}
if (opcode == "INT")
{
counter++;
}
delimeters = ", \n\t";
current = next + 1;
next = line.find_first_of(delimeters, current);
arg1 = line.substr(current, next-current);
cout << arg1<<endl;
i++;
}
}
Upvotes: 0
Views: 2173
Reputation: 1219
The problem is looking for the start of each subsequent word: current = next + 1
. You want to look for the first non-delimiter to be the start of the word and check if you're at the end of the line before looking for arguments.
Adding debug information, I see the following:
>> label: start=0 end=3 value="TOP"
>> opcode: start=4 end=4 value=""
>> label: start=0 end=3 value="VAL"
>> opcode: start=4 end=4 value=""
>> label: start=0 end=3 value="TAN"
>> opcode: start=4 end=4 value=""
Which tells me each attempt at opcode is finding another delimiter.
The problem is that you only increment one after the word and the next line.substr() catches the delimiter.
In the lookups after the start, change:
current = next + 1;
to:
current = line.find_first_not_of(delimeters, next + 1);
This allows it to look for the start the next word after any and all delimiters.
Also, you want to make the lookup of arguments conditional on line length remaining, so wrap it in if(next >0) { ... }
.
This gives me, with my debugging and your original output (made conditional):
>> label: start=0 end=3 value="TOP"
>> opcode: start=6 end=-1 value="NOP"
>> label: start=0 end=3 value="VAL"
>> opcode: start=6 end=9 value="INT"
>> arg1: start=10 end=-1 value="0"
0
>> label: start=0 end=3 value="TAN"
>> opcode: start=6 end=8 value="LA"
>> arg1: start=9 end=10 value="2"
2
Re-factor your parsing/tokenizing from the main loop so you can focus on them. You might even want to get cppunit (or similar) to help you test your parsing function. In the absence of such, it helps you go to one place and insert debugging information like:
cout << ">> " << whatIsBeingDebugged << ": " << start=" << current
<< " end=" << next << " value= \"" << value << "\"" << endl;
Making a robust lexical analyzer and parser is the subject of many libraries (lex and yacc, flex and bison, etc.), can be an application of others such as regular expressions and is even entire college courses. It is work. But, just be methodical, thorough, and test pieces in isolation such as with unit testing with cppunit (or similar).
Upvotes: 1
Reputation: 6914
using this technique has so many weakness and you wont check any result at all. for example when you say:
current = next + 1;
You should already know that you have only one delimiter between items! otherwise you should by pass all items, and when you say
next = line.find_first_of(delimeters, current);
<something> = line.substr(current, next - current)
You should be certain that find_first_of
find something, otherwise it will return -1 and next - current
will be something negative!
if I want to do this job, I use regex
, either from std
or boost
and using regex this task is a piece of cake, just use:
std::matches m;
std::regex rx("\\s*(\\w+)\\s+(\\w+)(?:\\s+(\\d+)\\s*(?:,(\\d+))?)?");
if (std::regex_match(line, m, rx)) {
// we found a match here
string label = m.str(1);
string opcode = m.str(2);
string arg1 = m.str(3), arg2 = m.str(4)
}
Upvotes: 2