Laceanu George
Laceanu George

Reputation: 58

Split a sentence in words using char pointers

I was working on a system that split a sentence to a 2D pointer.

I don't wanna use any kind of library or another ways like string, because I want to practice pointers and learn them.

char** sscanf(char* hstring)
{
    int count = 0;
    char* current = hstring;
    while (*current)
    {
        if (*current == ' ')
        {
            count++;
        }
        while (*current == ' ')
        {
            current++;
        }
        if (*current)
            break;
        current++;
    }
    char** result = new char*[count];
    current = hstring;
    char* nstr = new char;
    int c = 0, i = 0;
    while (*current)
    {
        if (!*current) break;
        cout << "t1";
        if (*current == ' ')
        {
            *(++result) = nstr;

            nstr = nullptr;
            nstr = new char;
        }
        cout << "t2";
        while (*current != '/0' && *current == ' ')
        {
            current++;
        }
        cout << "t3";
        while (*current != '/0' && *current != ' ')
        {
            if (!*current) break;
            *(++nstr) = *current;
            current++;
        }
        cout << "t4";
        *nstr = '/0';
        cout << "t5";
    }
    return result;
}

But it's very strange, sometimes redirects me to

static size_t __CLRCALL_OR_CDECL length(_In_z_ const _Elem * const _First) _NOEXCEPT // strengthened
        {   // find length of null-terminated string
        return (_CSTD strlen(_First));
        }

with error: Acces Violation, other times, choose a random line and call it Acces Breakout(sorry if I spelled wrong)

What I want from you is not to repair my code simply, I want some explanations, because I want to learn this stuff.

Upvotes: 0

Views: 1405

Answers (1)

Antonio Barreto
Antonio Barreto

Reputation: 131

First, some advice

I understand that you are making this function as an exercise, but being C++ I'd like to warn you that things like new char*[count] are bad practices and that's why std::vector or std::array were created.

You seem confused about how dynamic allocation works. The statement char* nstr = new char; will create just one byte (char) in heap memory, and nothing is guaranteed to be adjacent to it. This means that ++nstr is a "invalid" operation, I mean, it's making the nstr point to the next byte after the allocated one, which can be some random invalid location.

There is a whole lot of other dangerous operations in your code, like calling new several times (which reserves memory) and not calling delete on them when you no longer use the reserved memory (aka. memory leaks). Having said that, I strongly encourage you to study this subject, for example starting with the ISO C++ FAQ on memory management.

Also, before digging into pointers and dynamic allocation, you should be more confortable with statements and flow control. I say this because I see some clear misunderstandings, like:

while (*current) {
    if (!*current) break;
    ...
}

The check inside the if statement will certainly be false, because the while check is executed just before it and guarantees that the opposite condition is true. This means that this if is never evaluated to true and it's completely useless.

Another remark is: don't name your functions the same as standard libraries ones. sscanf is already taken, choose another (and more meaningful) one. This will save you some headaches in the future; be used to name your own functions properly.

A guided solution

I'm in a good mood, so I'll go through some steps here. Anyway, if someone is looking for an optimized and ready to go solution, see Split a String in C++.

0. Define the steps

Reading your code, I could guess some of your desired steps:

char** split_string(char* sentence)
{
    // Count the number of words in the sentence
    // Allocate memory for the answer (a 2D buffer)
    // Write each word in the output
}

Instead of trying to get them right all at once, why don't you try one by one? (Notice the function's and parameter's names, clearer in my opinion).

1. Count the words

You could start with a simple main(), testing your solution. Here is mine (sorry, I couldn't just adapt yours). For those who are optimization-addicted, this is not an optimized solution, but a simple snippet for the OP.

// I'll be using this header and namespace on the next snippets too.
#include <iostream>
using namespace std;

int main()
{
    char sentence[] = " This is    my sentence  ";

    int n_words = 0;
    char *p = sentence;
    bool was_space = true; // see logic below

    // Reading the whole sentence
    while (*p) {
        // Check if it's a space and advance pointer
        bool is_space = (*p++ == ' ');
        if (was_space && !is_space)
            n_words++;        // count as word a 'rising edge'
        was_space = is_space;
    }

    cout << n_words;
}

Test it, make sure you understand why it works. Now, you can move to the next step.

2. Allocate the buffer

Well, you want to allocate one buffer for each word, so we need to know the size of each one of them (I'll not discuss whether or not this is a good approach to the split sentence problem..). This was not calculated on the previous step, so we might do it now.

int main()
{
    char sentence[] = " This is    my sentence  ";

    ///// Count the number of words in the sentence

    int n_words = 0;
    char *p = sentence;
    bool was_space = true; // see logic below

    // Reading the whole sentence
    while (*p) {
        // Check if it's a space and advance pointer
        bool is_space = (*p++ == ' ');
        if (was_space && !is_space)
            n_words++;        // count as word a 'rising edge'
        was_space = is_space;
    }

    ///// Allocate memory for the answer (a 2D buffer)

    // This is more like C than C++, but you asked for it
    char **words = new char*[n_words];
    char *ini = sentence; // the initial char of each word

    for (int i = 0; i < n_words; ++i) {
        while (*ini == ' ') ini++;         // search next non-space char
        char *end = ini + 1;               // pointer to the end of the word
        while (*end && *end != ' ') end++; // search for \0 or space
        int word_size = end - ini;         // find out the word size by address offset
        ini = end;                         // next for-loop iteration starts
                                           //  at the next word
        words[i] = new char[word_size];    // a whole buffer for one word
        cout << i << ": " << word_size << endl; // debugging
    }

    // Deleting it all, one buffer at a time
    for (int i = 0; i < n_words; ++i) {
        delete[] words[i]; // delete[] is the syntax to delete an array
    }
}

Notice that I'm deleting the allocated buffers inside the main(). When you move this logic to your function, this deallocation will be performed by the caller of the function, since it will probably use the buffers before deleting them.

3. Assigning each word to its buffer

I think you got the idea. Assign the words and move the logic to the separated function. Update your question with a Minimal, Complete, and Verifiable example if you still have troubles.

I know this is a Q&A forum, but I think this is already a healthy answer to the OP and to others that may pass here. Let me know if I should answer differently.

Upvotes: 4

Related Questions