doc
doc

Reputation: 898

Cpp: Segmentation fault core dumped

I am trying to write a lexer, when I try to copy isdigit buffer value in an array of char, I get this core dumped error although I have done the same thing with identifier without getting error.

#include<fstream>
#include<iostream>
#include<cctype>
#include <cstring>
#include<typeinfo>

using namespace std;



int isKeyword(char buffer[]){
    char keywords[22][10] = {"break","case","char","const","continue","default", "switch",
                            "do","double","else","float","for","if","int","long","return","short",
                            "sizeof","struct","void","while","main"};
    int i, flag = 0;
    
    for(i = 0; i < 22; ++i){
        if(strcmp(keywords[i], buffer) == 0)
        {
            flag = 1;
            break;
        }
    }
    
    return flag;
}
int isSymbol_Punct(char word)
{
    int flag = 0;
    char symbols_punct[] = {'<','>','!','+','-','*','/','%','=',';','(',')','{', '}','.'};
    for(int x= 0; x< 15; ++x)
    {
        if(word==symbols_punct[x])
           {
               flag = 1;
               break;
           }
            
    }
    return flag;
}

int main()
{
    char buffer[15],buffer1[15];
    char identifier[30][10];
    char number[30][10];
    memset(&identifier[0], '\0', sizeof(identifier));
    memset(&number[0], '\0', sizeof(number));
    char word;
    ifstream fin("program.txt");
    if(!fin.is_open())
    {
        cout<<"Error while opening the file"<<endl;
    }
    int i,k,j,l=0;
    while (!fin.eof())
    {
        word  = fin.get();

        if(isSymbol_Punct(word)==1)
        {
            cout<<"<"<<word<<", Symbol/Punctuation>"<<endl;
        }
       
        if(isalpha(word))
        {        
            buffer[j++] = word;
            // cout<<"buffer: "<<buffer<<endl;
        }
        else if((word == ' ' || word == '\n' || isSymbol_Punct(word)==1) && (j != 0))
        {
            buffer[j] = '\0';
            j = 0;
                            
            if(isKeyword(buffer) == 1)
                cout<<"<"<<buffer<<", keyword>"<<endl;
            else
                {
                cout<<"<"<<buffer<<", identifier>"<<endl;
                strcpy(identifier[i],buffer);
                i++;
                }
                    
        } 
           
        else if(isdigit(word))
        {
            buffer1[l++] = word;
            cout<<"buffer: "<<buffer1<<endl;
        }
        else if((word == ' ' || word == '\n' || isSymbol_Punct(word)==1) && (l != 0))
        {
            buffer1[l] = '\0';
            l = 0;
            cout<<"<"<<buffer1<<", number>"<<endl;
            // cout << "Type is: "<<typeid(buffer1).name() << endl;
            strcpy(number[k],buffer1);
            k++;
                        
        } 

    }
    cout<<"Identifier Table"<<endl;
    int z=0;
    while(strcmp(identifier[z],"\0")!=0) 
    {       
        cout <<z<<"\t\t"<< identifier[z]<<endl; 
        z++;
        
    }  
    // cout<<"Number Table"<<endl;
    // int y=0;
    // while(strcmp(number[y],"\0")!=0) 
    // {       
    //     cout <<y<<"\t\t"<< number[y]<<endl; 
    //     y++;
        
    // }   

    
}

I am getting this error when I copy buffer1 in number[k] using strcpy. I do not understand why it is not being copied. When i printed the type of buffer1 to see if strcpy is not generating error, I got A_15, I searched for it, but did not find any relevant information.

Upvotes: 0

Views: 1017

Answers (2)

Zo&#235; Sparks
Zo&#235; Sparks

Reputation: 340

The reason is here (line 56):

int i,k,j,l=0;

You might think that this initializes i, j, k, and l to 0, but in fact it only initializes l to 0. i, j, and k are declared here, but not initialized to anything. As a result, they contain random garbage, so if you use them as array indices you are likely to end up overshooting the bounds of the array in question.

At that point, anything could happen—in other words, this is undefined behavior. One likely outcome, which is probably happening to you, is that your program tries to access memory that hasn't been assigned to it by the operating system, at which point it crashes (a segmentation fault).

To give a concrete demonstration of what I mean, consider the following program:

#include <iostream>

void print_var(std::string name, int v)
{
    std::cout << name << ": " << v << "\n";
}

int main(void)
{
    int i, j, k, l = 0;

    print_var("i", i);
    print_var("j", j);
    print_var("k", k);
    print_var("l", l);

    return 0;
}

When I ran this, I got the following:

i: 32765
j: -113535829
k: 21934
l: 0

As you can see, i, j, and k all came out such that using them as indices into any of the arrays you declared would exceed their bounds. Unless you are very lucky, this will happen to you, too.

You can fix this by initializing each variable separately:

int i = 0;
int j = 0;
int k = 0;
int l = 0;

Initializing each on its own line makes the initializations easier to see, helping to prevent mistakes.

A few side notes:

  • I was able to spot this issue immediately because I have my development environment configured to flag lines that provoke compiler warnings. Using a variable before it's being initialized should provoke such a warning if you're using a reasonable compiler, so you can fix problems like this as you run into them. Your development environment may support the same feature (and if it doesn't, you might consider switching to something that does). If nothing else, you can turn on warnings during compilation (by passing -Wall -Wextra to your compiler or the like—check its documentation for the specifics).
  • Since you declared your indices as int, they are signed integers, which means they can hold negative values (as j did in my demonstration). If you try to index into an array using a negative index, you will end up dereferencing a pointer to a location "behind" the start of the array in memory, so you will be in trouble even with an index of -1 (remember that a C-style array is basically just a pointer to the start of the array). Also, int probably has only 32 bits in your environment, so if you're writing 64-bit code then it's possible to define arrays too large for an int to fully cover, even if you were to index into the array from the middle. For these sorts of reasons, it's generally a good idea to type raw array indices as std::size_t, which is always capable of representing the size of the largest possible array in your target environment, and also is unsigned.
  • You describe this as C++ code, but I don't see much C++ here aside from the I/O streams. C++ has a lot of amenities that can help you guard against bugs compared to C-style code (which has to be written with great care). For example, you could replace your C-style arrays here with instances of std::array, which has a member function at() that does subscripting with bounds checking; that would have thrown a helpful exception in this case instead of having your program segfault. Also, it doesn't seem like you have a particular need for fixed-size arrays in this case, so you may better off using std::vector; this will automatically grow to accommodate new elements, helping you avoid writing outside the vector's bounds. Both support range-based for loops, which save you from needing to deal with indices by hand at all. You might enjoy Bjarne's A Tour of C++, which gives a nice overview of idiomatic C++ and will make all the wooly reference material easier to parse. (And if you want to pick up some nice C habits, both K&R and Kernighan and Pike's The Practice of Programming can save you much pain and tears).

Upvotes: 1

Secundi
Secundi

Reputation: 1190

Some general hints that might help you to avoid your cause of crash totally by design:

  • As this is C++, you should really refer to established C++ data types and schemes here as far as possible. I know, that distinct stuff in terms of parser/lexer writing can become quite low-level but at least for the things you want to achieve here, you should really appreciate that. Avoid plain arrays as far as possible. Use std::vector of uint8_t and/or std::string for instance.
  • Similar to point 1 and a consequence: Always use checked bounds iterations! You don't need to try to be better than the optimizer of your compiler, at least not here! In general, one should always avoid to duplicate container size information. With the stated C++ containers, this information is always provided on data source side already. If not possible for very rare cases (?), use constants for that, directly declared at/within data source definition/initialization.
  • Give your variables meaningful names, declare them as local to their used places as possible.
  • isXXX-methods - at least your ones, should return boolean values. You never return something else than 0 or 1.
  • A personal recommendation that is a bit controversional to be a general rule: Use early returns and abort criteria! Even after the check for file reading issues, you proceed further.
  • Try to keep your functions smart and non-boilerplate! Use sub-routines for distinct sub-tasks!
  • Try to avoid using namespace that globally! Even without exotic building schemes like UnityBuilds, this can become error-prone as hell for huger projects at latest.
  • the arrays keywords and symbols_punct should be at least static const ones. The optimizer will easily be able to recognize that but it's rather a help for you for fast code understanding at least. Try to use classes here to compound the things that belong together in a readable, adaptive, easy modifiable and reusable way. Always keep in mind, that you might want to understand your own code some months later still, maybe even other developers.

Upvotes: 0

Related Questions