Reputation: 3
I have strings that have HTML tags in them (e.g.: "<p>sample_text</p>"
).
I would like to remove these tags from the strings as seen in the pseudo-code below:
string(string input_string)
{
int i = 0
bool is_deleting = False
while(i < length(input_string))
{
if(input_string[i] == "<")
{
is_deleting = True
}
if(is_deleting == True)
{
if(input_string[i] == ">")
{
is_deleting = False
}
input_string[i] = ""
}
i += 1
}
return input_string
}
How could I make this work?
Upvotes: 0
Views: 383
Reputation: 67476
char *removetags(char *str, char opentag, char closetag)
{
char *write = str, *read = str;
int remove = 0;
while(*read)
{
if(*read == closetag && remove)
{
read++;
remove = 0;
}
if(*read == opentag || remove)
{
read++;
remove = 1;
}
else
{
*write++ = *read++;
}
}
*write = 0;
return str;
}
Upvotes: 0
Reputation: 84521
You are thinking in the right direction, you have just confused the logic for deleting. In your case where you consider the tags to be is_deleting
you only want to copy characters when not deleting.
Rather than considering if your condition is_deleting
why not consider whether you are intag
. At least when iterating over characters, being either in at tag ignoring characters or not in a tag copying characters seems a bit more descriptive.
Regardless you have 3 conditions for the current character. It is either (1) a '<'
indicating a tag-opening where you set your intag
flag true, or (2) the intag
flag is true and the current character is '>'
marking the close of the tag, or (3) intag
is false and you are copying characters. You can implement that logic as follows:
When looping over the characters in any string, there is no need to take the strlen()
. The nul-terminating character marks the end of the string for you.
If you put that together, you could do:
#include <stdio.h>
char *rmtags (char *s)
{
int intag = 0, /* flag in-tag 0/1 (false/true) */
write = 0; /* write index */
for (int i = 0; s[i]; i++) { /* loop over each char in s */
if (s[i] == '<') /* tag opening? */
intag = 1; /* set intag flag true */
else if (intag) { /* if inside a tag */
if (s[i] == '>') /* tag close */
intag = 0; /* set intag false */
}
else /* not opening & not in tag */
s[write++] = s[i]; /* copy to write index, increment */
}
s[write] = 0; /* nul-terminate s */
return s; /* convenience return of s */
}
int main (void) {
char s[] = "<p>sample_text</p>";
printf ("text: '%s'\n", rmtags (s));
}
(note: You don't want to reinvent the wheel to parse html. See Parse html using C and particularly gumbo-parser. In this limited simple example -- it is trivial, but nested tags spanning multiple lines wildly complicate this endeavor quickly. Use a library that validates html)
Example Use/Output
$ ./bin/html_rmtags
text: 'sample_text'
Upvotes: 1