Reputation: 524
I am very new to C, and I have created a function that removes special characters from a string and returns a new string (without the special characters).
At first glance, this seemed to be working well, I now need to run this function on the lines of a (huge) text file (1 Million sentences). After a few thousand lines/sentences (About 4,000) I get a seg fault.
I don't have much experience with memory allocation and strings in C, I have tried to figure out what the problem with my code is, unfortunately without any luck. Here is the code:
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
char *preproccessString(char *str) {
// Create a new string of the size of the input string, so this might be bigger than needed but should never be too small
char *result = malloc(sizeof(str));
// Array of allowed chars with a 0 on the end to know when the end of the array is reached, I don't know if there is a more elegant way to do this
// Changed from array to string for sake of simplicity
char *allowedCharsArray = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
// Initalize two integers
// i will be increased for every char in the string
int i = 0;
// j will be increased every time a new char is added to the result
int j = 0;
// Loop over the input string
while (str[i] != '\0') {
// l will be increased for every char in the allowed chars array
int l = 0;
// Loop over the chars in the allowed chars array
while (allowedCharsArray[l] != '\0') {
// If the char (From the input string) currently under consideration (index i) is present in the allowed chars array
if (allowedCharsArray[l] == toupper(str[i])) {
// Set char at index j of result string to uppercase version of char currently under consideration
result[j] = toupper(str[i]);
j++;
}
l++;
}
i++;
}
return result;
}
Here is the rest of the program, I think the problem is probably here.
int main(int argc, char *argv[]) {
char const * const fileName = argv[1];
FILE *file = fopen(fileName, "r");
char line[256];
while (fgets(line, sizeof(line), file)) {
printf("%s\n", preproccessString(line));
}
fclose(file);
return 0;
}
Upvotes: 3
Views: 6029
Reputation: 16530
The following proposed code:
strchr()
also checking the terminating NUL bytefree()
allowedCharsArray
to 'file static scope' so does not have to be re-initialized on each pass through the loop and marks as 'const' to help the compiler catch errorsand now the proposed code: (note: edited per comments)
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
char *preproccessString(char *str)
{
// Create a new string of the size of the input string, so this might be bigger than needed but should never be too small
char *result = calloc( sizeof( char ), strlen(str)+1);
if( !result )
{
perror( "calloc failed" );
return NULL;
}
// Array of allowed chars
static const char *allowedCharsArray = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
// Loop over the input string
for( int j=0, i=0; str[i]; i++)
{
if( strchr( allowedCharsArray, (char)toupper( str[i] ) ) )
{
// Set char at index j of result string to uppercase version of char currently under consideration
result[j] = (char)toupper(str[i]);
j++;
}
}
return result;
}
Upvotes: 1
Reputation: 144959
There are some major issues in your code:
the amount of memory allocated is incorrect, sizeof(str)
is the number of bytes in a pointer, not the length of the string it points to, which would also be incorrect. You should write char *result = malloc(strlen(str) + 1);
the memory allocated in preproccessString
is never freed, causing memory leaks and potentially for the program to run out of memory on very large files.
you do not set a null terminator at the end of the result
string
Lesser issues:
fopen()
succeeded.preproccessString
, it should be preprocessString
isalpha
instead of testing every letterchar
values as unsigned char
when passing them to toupper
because char
may be a signed type and toupper
is undefined for negative values except EOF
.Here is a modified version:
#include <ctype.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
// transform the string in `str` into buffer dest, keeping only letters and uppercasing them.
char *preprocessString(char *dest, const char *str) {
int i, j;
for (i = j = 0; str[i] != '\0'; i++) {
if (isalpha((unsigned char)str[i])
dest[j++] = toupper((unsigned char)str[i]);
}
dest[j] = '\0';
return dest;
}
int main(int argc, char *argv[]) {
char line[256];
char dest[256];
char *filename;
FILE *file;
if (argc < 2) {
fprintf(stderr, "missing filename argument\n");
return 1;
}
filename = argv[1];
if ((file = fopen(filename, "r")) == NULL) {
fprintf(stderr, "cannot open %s: %s\n", filename, strerror(errno));
return 1;
}
while (fgets(line, sizeof(line), file)) {
printf("%s\n", preprocessString(dest, line));
}
fclose(file);
return 0;
}
Upvotes: 1
Reputation: 781592
You have several problems.
sizeof(str)
is the size of a pointer, not the length of the string. You need to use char *result = malloc(strlen(str) + 1);
+ 1
is for the terminating null byte.
result[j] = '\0';
before return result;
Once you find that the character matches an allowed character, there's no need to keep looping through the rest of the allowed characters. Add break
after j++
.
Your main()
function is never freeing the results of preprocessString()
, so you might be running out of memory.
while (fgets(line, sizeof(line), file)) {
char *processed = preproccessString(line);
printf("%s\n", processed);
free(processed);
}
You could address most of these problems if you have the caller pass in the result string, instead of allocating it in the function. Just use two char[256]
arrays in the main()
function.
int main(int argc, char *argv[])
{
char const* const fileName = argv[1];
FILE* file = fopen(fileName, "r");
char line[256], processed[256];
while (fgets(line, sizeof(line), file)) {
processString(line, processed);
printf("%s\n", processed);
}
fclose(file);
return 0;
}
Then just change the function so that the parameters are:
void preprocessString(const char *str, char *result)
Upvotes: 6
Reputation: 11
A good rule of thumb is to make sure there is one free for every malloc/calloc call.
Also, a good tool to keep note of for the future is Valgrind. It's very good at catching these kinds of errors.
Upvotes: 1
Reputation: 318
I think the problem is you are using malloc which allocates memory from the heap and since you are calling this function again and again you are running out of memory. To solve this issue you have to call the free() function on the pointer returned by your preprocessString function In your main block
char *result=preprocessString(inputstring);
//Do whatever you want to do with this result
free(result);
Upvotes: 0