Reputation: 20174
I have read a few lines of text into an array of C-strings. The lines have an arbitrary number of tab or space-delimited columns, and I am trying to figure out how to remove all the extra whitespace between them. The end goal is to use strtok to break up the columns. This is a good example of the columns:
Cartwright Wendy 93 Williamson Mark 81 Thompson Mark 100 Anderson John 76 Turner Dennis 56
How can I eliminate all but one of the spaces or tabs between the columns so the output looks like this?
Cartwright Wendy 93
Alternatively, can I just replace all of the whitespace between the columns with a different character in order to use strtok? Something like this?
Cartwright#Wendy#93
edit: Multiple great answers, but had to pick one. Thanks for the help all.
Upvotes: 4
Views: 23050
Reputation: 1
The following code simply takes input character wise, then check for each character if there is space more than once it skips it else it prints the character. Same logic you can use for tab also. Hope it helps in solving your problem. If there is any problem with this code please let me know.
int c, count = 0;
printf ("Please enter your sentence\n");
while ( ( c = getchar() ) != EOF ) {
if ( c != ' ' ) {
putchar ( c );
count = 0;
}
else {
count ++;
if ( count > 1 )
; /* Empty if body */
else
putchar ( c );
}
}
}
Upvotes: 0
Reputation: 1
I made a small improvment over John Bode's to remove trailing whitespace as well:
#include <ctype.h>
char *squeeze(char *str)
{
char* r; /* next character to be read */
char* w; /* next character to be written */
char c;
int sp, sp_old = 0;
r=w=str;
do {
c=*r;
sp = isspace(c);
if (!sp) {
if (sp_old && c) {
// don't add a space at end of string
*w++ = ' ';
}
*w++ = c;
}
if (str < w) {
// don't add space at start of line
sp_old = sp;
}
r++;
}
while (c);
return str;
}
#include <stdio.h>
int main(void)
{
char test[] = "\t\nThis\nis\ta\f test.\n\t\n";
//printf("test = %s\n", test);
printf("squeeze(test) = '%s'\n", squeeze(test));
return 0;
}
br.
Upvotes: 0
Reputation: 11
char* trimwhitespace(char *str_base) {
char* buffer = str_base;
while((buffer = strchr(str_base, ' '))) {
strcpy(buffer, buffer+1);
}
return str_base;
}
Upvotes: 1
Reputation: 39750
Why not use strtok()
directly? No need to modify the input
All you need to do is repeat strtok()
until you get 3 non-space tokens and then you are done!
Upvotes: 5
Reputation: 753525
Here's an alternative function that squeezes out repeated space characters, as defined by isspace()
in <ctype.h>
. It returns the length of the 'squidged' string.
#include <ctype.h>
size_t squidge(char *str)
{
char *dst = str;
char *src = str;
char c;
while ((c = *src++) != '\0')
{
if (isspace(c))
{
*dst++ = ' ';
while ((c = *src++) != '\0' && isspace(c))
;
if (c == '\0')
break;
}
*dst++ = c;
}
*dst = '\0';
return(dst - str);
}
#include <stdio.h>
#include <string.h>
int main(void)
{
char buffer[256];
while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
size_t len = strlen(buffer);
if (len > 0)
buffer[--len] = '\0';
printf("Before: %zd <<%s>>\n", len, buffer);
len = squidge(buffer);
printf("After: %zd <<%s>>\n", len, buffer);
}
return(0);
}
Upvotes: 0
Reputation: 123458
The following code modifies the string in place; if you don't want to destroy your original input, you can pass a second buffer to receive the modified string. Should be fairly self-explanatory:
#include <stdio.h>
#include <string.h>
char *squeeze(char *str)
{
int r; /* next character to be read */
int w; /* next character to be written */
r=w=0;
while (str[r])
{
if (isspace(str[r]) || iscntrl(str[r]))
{
if (w > 0 && !isspace(str[w-1]))
str[w++] = ' ';
}
else
str[w++] = str[r];
r++;
}
str[w] = 0;
return str;
}
int main(void)
{
char test[] = "\t\nThis\nis\ta\b test.";
printf("test = %s\n", test);
printf("squeeze(test) = %s\n", squeeze(test));
return 0;
}
Upvotes: 2
Reputation: 7738
You could read a line then scan it to find the start of each column. Then use the column data however you'd like.
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#define MAX_COL 3
#define MAX_REC 512
int main (void)
{
FILE *input;
char record[MAX_REC + 1];
char *scan;
const char *recEnd;
char *columns[MAX_COL] = { 0 };
int colCnt;
input = fopen("input.txt", "r");
while (fgets(record, sizeof(record), input) != NULL)
{
memset(columns, 0, sizeof(columns)); // reset column start pointers
scan = record;
recEnd = record + strlen(record);
for (colCnt = 0; colCnt < MAX_COL; colCnt++ )
{
while (scan < recEnd && isspace(*scan)) { scan++; } // bypass whitespace
if (scan == recEnd) { break; }
columns[colCnt] = scan; // save column start
while (scan < recEnd && !isspace(*scan)) { scan++; } // bypass column word
*scan++ = '\0';
}
if (colCnt > 0)
{
printf("%s", columns[0]);
for (int i = 1; i < colCnt; i++)
{
printf("#%s", columns[i]);
}
printf("\n");
}
}
fclose(input);
}
Note, the code could still use some robust-ification: check for file errors w/ferror; ensure eof was hit w/feof; ensure entire record (all column data) was processed. It could also be made more flexible by using a linked list instead of a fixed array and could be modified to not assume each column only contains a single word (as long as the columns are delimited by a specific character).
Upvotes: 0
Reputation: 881537
Edit: I originally had a malloced workspace, which I though might be clearer. However, doing it w/o extra memory is almost as simple, and I'm being pushed that way in comments and personal IMs, so, here comes...:-)
void squeezespaces(char* row, char separator) {
char *current = row;
int spacing = 0;
int i;
for(i=0; row[i]; ++i) {
if(row[i]==' ') {
if (!spacing) {
/* start of a run of spaces -> separator */
*current++ = separator
spacing = 1;
}
} else {
*current++ = row[i];
spacing = 0;
}
*current = 0;
}
Upvotes: 2
Reputation: 75389
If I may voice the "you're doing it wrong" opinion, why not just eliminate the whitespace while reading? Use fscanf("%s", string);
to read a "word" (non whitespace), then read the whitespace. If it's spaces or tabs, keep reading into one "line" of data. If it's a newline, start a new entry. It's probably easiest in C to get the data into a format you can work with as soon as possible, rather than trying to do heavy-duty text manipulation.
Upvotes: 11