UczihaItaczke
UczihaItaczke

Reputation: 11

Extract string from txt file in C

I have to extract every "Artykuł" from a txt file and number of line with that string. When I try to compile my program I have error: "invalid initializer char str[]=line;" so I don't know how should I assign every word separately from each line to a char table.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#define TRUE 1

int getWords(char *base, char target[10][20])
{
    int n=0,i,j=0;
    
    for(i=0;TRUE;i++)
    {
        if(base[i]!=' '){
            target[n][j++]=base[i];
        }
        else{
            target[n][j++]='\0';//insert NULL
            n++;
            j=0;
        }
        if(base[i]=='\0')
            break;
    }
    return n;
    
}
int main()
{
  FILE * fp;
    char * line = NULL;
    size_t len = 0;
    ssize_t read;

    fp = fopen("dyrekt.html", "r");
    if (fp == NULL)
        exit(EXIT_FAILURE);
        while ((read = getline(&line, &len, fp)) != -1) {
    int n; //number of words
    int i; //loop counter 
    char str[]=line;
    char arr[10][20];
    
    n=getWords(str,arr);
    
    for(i=0;i<=n;i++)
        printf("%s\n",arr[i]);
    
    return 0;
}
fclose(fp);
    if (line)
        free(line);
    exit(EXIT_SUCCESS);
    }

Upvotes: 1

Views: 1268

Answers (2)

David C. Rankin
David C. Rankin

Reputation: 84561

You have a large number of issues to correct. Specifically:

  1. never hardcode filenames or use MagicNumbers in your code. You should not have to re-compile your program just to read from a different filename. Pass the filename as the first argument to your program (that's what argc and argv are for in int main (int argc, char **argv)), or prompt the user and take the filename as input. Instead of sprinkling Magic Numbers in your code (10, 20), #define a constant or use a global enum;
  2. you do not need char str[] to begin with, simply pass line to getWords();
  3. you invoke Undefined Behavior reading beyond the end of your array, your loop limits should be 0 <= i < n, so that means for (i = 0; i < n; i++) NOT i <= n;
  4. don't return 0; at the end of your read-loop. That means you exit after only 1 iteration is complete;
  5. In getWords() you must protect your array bounds. What if there are more than 10 words in the line, or more than 19 characters in the word? If you had defined constants for your array bounds, you can simply add the comparison to your loop conditions and if() conditions;
  6. Don't use TRUE for a continuous loop as you loop over base, the proper loop limit is the number of rows in your array. If you have #define ROWS 10, then your i loop is for (i = 0; n < ROWS; i++). You must do the same for your j count. If you #define COLS 20, your would do if (j == COLS - 1 || isspace (base[i])) to protect your column limit while also catching the ' ' for end of word; and
  7. you must know whether you are in a word reading characters, or between words reading spaces. Otherwise, if your line has leading, trailing or multiple included spaces between words, your arr will hold a '\0' (empty-string) for every ' ' encountered. You can simply use an int inword = 0; as a flag and set it to 1 (true) when reading characters, or 0 when you encouter a space. Then you only add a word to your array if (inword).

Now to fix the issue, start by declaring the constants for your array:

#define ROWS 10         /* if you need a constant, #define one (or more) */
#define COLS 20

Take the filename to read as the first argument to your program (or read from stdin by default if no argument is provided), e.g.

int main (int argc, char **argv) {
    
    ...
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    // fp = fopen ("dyrekt.html", "r");     /* NEVER hardcode filenames */
    if (fp == NULL) {
        perror ("fopen-file");
        exit (EXIT_FAILURE);
    }

Pass line to getwords() and then fix the loop limits in main() to display the words and remove the unneeded return 0;:

        n = getWords (line, arr);           /* pass line, str not needed */

        for (i = 0; i < n; i++)             /* i < n, not i <= n */
            printf ("%s\n", arr[i]);

        // return 0;    /* what? you exit at end of iteration */

Finally, all of the changes for getwords(), including the checks to protect your array bounds can be done as follows:

int getWords (char *base, char (*target)[COLS])
{
    int n = 0, i, j = 0, inword = 0;        /* inword is flag for in/out of word */

    for (i = 0; n < ROWS; i++) {                    /* protect array bounds */
        if (j == COLS - 1 || isspace (base[i])) {   /* both ROWS and COLS  */
            if (inword) {                           /* check if inword before adding */
                target[n][j++] = '\0';              //insert NULL
                n++;
                inword = j = 0;                     /* reset inword as well as j */
            }
        }
        else {
            target[n][j++] = base[i];
            inword = 1;                             /* set inword true */
        }
        if (!base[i])
            break;
    }
    
    return n;
}

(note: you should add additional error handling to the case where j == COLS - 1 to handle the additional characters that do not fit. You can just discard though the next space with for (int c = getchar(); !isspace(c) && c != EOF; c = getchar()) {})

Putting it altogether, you would have:

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>      /* use issspace() to check whitespace */

#define ROWS 10         /* if you need a constant, #define one (or more) */
#define COLS 20

int getWords (char *base, char (*target)[COLS])
{
    int n = 0, i, j = 0, inword = 0;        /* inword is flag for in/out of word */

    for (i = 0; n < ROWS; i++) {                    /* protect array bounds */
        if (j == COLS - 1 || isspace (base[i])) {   /* both ROWS and COLS  */
            if (inword) {                           /* check if inword before adding */
                target[n][j++] = '\0';              //insert NULL
                n++;
                inword = j = 0;                     /* reset inword as well as j */
            }
        }
        else {
            target[n][j++] = base[i];
            inword = 1;                             /* set inword true */
        }
        if (!base[i])
            break;
    }
    
    return n;
}

int main (int argc, char **argv) {
    
    char *line = NULL;
    size_t len = 0;
    ssize_t read;
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    // fp = fopen ("dyrekt.html", "r");     /* NEVER hardcode filenames */
    if (fp == NULL) {
        perror ("fopen-file");
        exit (EXIT_FAILURE);
    }
    
    while ((read = getline (&line, &len, fp)) != -1) {
        int n,                              //number of words
            i;                              //loop counter 
        char arr[ROWS][COLS] = {{0}};       /* initialize arrays */

        n = getWords (line, arr);           /* pass line, str not needed */

        for (i = 0; i < n; i++)             /* i < n, not i <= n */
            printf ("%s\n", arr[i]);

        // return 0;    /* what? you exit at end of iteration */
    }
    fclose (fp);
    
    free (line);        /* no neef for if, calling free (NULL) doesn't hurt */
    
    exit (EXIT_SUCCESS);
}

(note: it is a good idea to initialize a 2D array used to hold strings all zero)

Example Input file

$ cat dat/captnjack.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.

Example Use/Output

Passing the file to read as the first argument:

$ ./bin/artykul dat/captnjack.txt
This
is
a
tale
Of
Captain
Jack
Sparrow
A
Pirate
So
Brave
On
the
Seven
Seas.

Look things over and let me know if you have further questions.

Upvotes: 1

Silamoth
Silamoth

Reputation: 36

@Ramus05's comment hinted at this, but I'll expand here. I made a simple test program to test that element of your code. When I compile the following code:

#include <stdio.h>
int main()
{
    char* test = "this is a test string";
    char str[] = test;
    printf("%s\n", str);
}

I get the same error you were. However, this code compiles:

#include <stdio.h>
int main()
{
    char* test = "this is a test string";
    char* str = test;
    printf("%s\n", str);
}

When I run it, I get the following output:

this is a test string

So that's how you can fix your issue.

But I'm sure you're wondering why that happens. Well, the copied string variable (in my case test, in your case line) is a pointer in memory. This pointer could point to something of (theoretically) any length. An array, however, is somewhat different. It's allocated to be a specific length. As a result, C doesn't let you initialize it to be something of variable length.

So, for example, the following code is valid in C:

char str[50] = "test string";

This is valid because the literal string "test string" is of a static length.

In your case, you were trying to set your char[] to something of variable length, and C doesn't allow that. Changing to a char* fixes this since that's a pointer. Alternatively, you could statically allocate an array and use strcpy or, even safer, strncpy.

Upvotes: 1

Related Questions