Reputation: 33
I am trying to write a function that turns all spans of whitespace (i.e multiple spaces, a newline, a tab, or any continuous sequence of the aforementioned) into a single space.
For example, the following inputs:
"example\tinput\tstring"
"example\ninput\nstring"
"example \tinput \n string"
Would all result in the same output: "example input string"
I currently have the following code based on a similar question here on Stack Overflow (https://stackoverflow.com/a/1217750/6652030, the 2nd answer). It handles sequences of multiple spaces correctly, but it doesn't replace tabs and newlines with spaces as intended. If I pass the first two example inputs, my resulting string is "exampleinputstring". Any thoughts on where I'm going wrong?
void removeExtraWhitespace(char *src, char *dst) {
for (; *src; ++dst, ++src) {
*dst = *src;
if (*src == '\n' || *src == '\t') {
*src = ' ';
}
else if (isspace(*src)) {
while (!isspace(*(src + 1))) {
++src;
}
}
}
*dst = '\0';
}
Upvotes: 1
Views: 207
Reputation: 2789
You can make the following modification to your code:
void removeExtraWhitespace(char *dst, const char *src) {
for(; *src; ++dst, ++src) {
if (isspace(*src)) {
*dst = ' ';
while (isspace(*(src + 1))) {
++src;
}
} else {
*dst = *src;
}
}
*dst = '\0';
}
For example,
char dst[50];
$ removeExtraWhitespace(dst, "\t\t\texample\t\t\n input\nstring\n\n ");
$ printf("%s\n", dst);
example input string
Upvotes: 1
Reputation: 753585
The code I came up with for the problem was:
static
void removeExtraWhiteSpace(char *src, char *dst)
{
while (*src != '\0')
{
if (!isspace((unsigned char)*src))
*dst++ = *src++;
else
{
*dst++ = ' ';
while (isspace((unsigned char)*src))
src++;
}
}
*dst = '\0';
}
For each character, if it isn't a space according to isspace()
from <ctype.h>
, copy it to the output. If it is a space, copy a space to the output, and skip over any following space characters. When finished, add the null terminator.
The function is static
because I make all functions static
unless there's a header that declares the function for use in other files — the compilation options I use require this discipline (or a prior extern void removeExtraWhiteSpace(char *src, char *dst)
declaration for the function — but static
is shorter.
If you want to remove leading and trailing blanks, it isn't much harder:
static
void removeExtraWhiteSpace(char *src, char *dst)
{
char *tgt = dst;
while (isspace((unsigned char)*src))
src++;
while (*src != '\0')
{
if (!isspace((unsigned char)*src))
*tgt++ = *src++;
else
{
*tgt++ = ' ';
while (isspace((unsigned char)*src))
src++;
}
}
*tgt = '\0';
if (tgt > dst && tgt[-1] == ' ')
tgt[-1] = '\0';
}
Test code:
static void test_string(char *buffer1)
{
printf("Before [%s]\n", buffer1);
char buffer2[1024];
removeExtraWhiteSpace(buffer1, buffer2);
printf("After [%s]\n", buffer2);
}
int main(void)
{
test_string("example\tinput\tstring");
test_string("example\ninput\nstring");
test_string("example \tinput \n string");
test_string(" \t spaces\t \tand tabs\tboth before\t\tand \t \t after \t\t ");
#ifdef GO_INTERACTIVE
char buffer[1024];
while (fgets(buffer, sizeof(buffer), stdin) != 0)
{
buffer[strcspn(buffer, "\n")] = '\0';
test_string(buffer);
}
#endif /* GO_INTERACTIVE */
return 0;
}
Plain output:
Before [example input string]
After [example input string]
Before [example
input
string]
After [example input string]
Before [example input
string]
After [example input string]
Before [ spaces and tabs both before and after ]
After [ spaces and tabs both before and after ]
With tabs and newlines marked (^I for tabs, ^J for newlines):
Before [example^Iinput^Istring]^J
After [example input string]^J
Before [example^J
input^J
string]^J
After [example input string]^J
Before [example ^Iinput ^J
string]^J
After [example input string]^J
Before [ ^I spaces^I ^Iand tabs^Iboth before^I^Iand ^I ^I after ^I^I ]^J
After [ spaces and tabs both before and after ]^J
Upvotes: 0
Reputation: 153348
How to replace spans of whitespace with a single space in a C string?
Simply keep track of the previous action and test for whitespace. Only 1 tight loop needed, one call to isspace()
. This also handles leading/trailing whitespace.
#include <ctype.h>
#include <stdbool.h>
void removeExtraWhitespace(const char *src, char *dst) {
bool previous_was_whitespace = false;
while (*src) {
if (isspace((unsigned char) *src)) {
if (!previous_was_whitespace) {
*dst++ = ' ';
}
previous_was_whitespace = true;
} else {
*dst++ = *src;
previous_was_whitespace = false;
}
src++;
}
*dst = '\0';
}
Any thoughts on where I'm going wrong?
When OP's code first encounters a '\n'
, '\t'
, it changes src[]
, but that never effect dst[]
.
Also drop the else
in the below code. This allows consumption of consecutive whitespace after '\n'
, '\t'
. Yet this still has trouble other white-spaces such as '\r'
.
if (*src == '\n' || *src == '\t') {
*src = ' ';
}
// else if (isspace(*src)) {
if (isspace(*src)) {
This code uses isspace((unsigned char) *src)
rather than isspace(*src)
as isspace()
is only defined for values in the unsigned char
range and EOF
. With learner programs, it is unusual to encounter negative values for *src
, yet they can exist and conversion to the unsigned char
range is prudent.
Upvotes: 0
Reputation: 5856
In essence you are copying a string into a new place while collapsing all the whitespace (and newline) characters into a single character (if I understood you correctly).
While you are copying over you have three possible "modes of operation":
Putting that into pseudocode, this looks like the following:
bool skipping = false;
for(each char){
if(iswhite(char)){
skipping = true;
continue; // next source char
}
// not a whitespace, let's see if we are done skipping
if(skipping){
// collapse all those skipped whitespaces into one
copy_space_into_dest;
}
skipping = false;
copy_char_into_dest;
}
The iswhite()
above could be a call to isspace()
or your own function that will return true
for anything it considers to be a white space (for example if you decide that a _
is a "white space")
Upvotes: 0