Junior
Junior

Reputation: 12002

How to replace the an html tag with another string using c#?

I have a c# code that will read an html file and return it content as string/text.

One thing that I need to do is parse the html string, look for all <embed> tags, get the value in the "src" attribute then replace the entire <embed> tag with the content of the file that is found in the src tag.

I am trying to use the HtmlAgilityPack to allow me to parse the html code.

The only thing that I am not able to do is how to replace the <embed> tag with another string and finally return the new string with no <embed> tag to the user.

Here is what I have done

    protected string ParseContent(string content)
    {
        if (content != null)
        {
            //Create a new document parser object
            HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();

            //load the content
            document.LoadHtml(content);

            //Get all embed tags
            IEnumerable<HtmlNode> embedNodes = document.DocumentNode.Descendants("embed");

            //Make sure the content contains at least one <embed> tag
            if (embedNodes.Count() > 0)
            {
                // Outputs the href for external links
                foreach (HtmlNode embedNode in embedNodes)
                {
                    //Mak sure there is a source
                    if (embedNode.Attributes.Contains("src"))
                    {
                        //If the file ends with ".html"
                        if (embedNode.Attributes["src"].Value.EndsWith(".html"))
                        {
                            var newContent = GetContent(embedNode.Attributes["src"].Value);

                            //Here I need to be able to replace the entireembedNode with the newContent
                        }

                    }
                }
            }

            return content;
        }

        return null;
    }

    protected string GetContent(string path)
    {

        if (System.IO.File.Exists(path))
        {
            //The file exists, read its content
            return System.IO.File.ReadAllText(path);
        }

        return null;
    }

How can I replace the <embed> tag with a string?

Upvotes: 1

Views: 2682

Answers (2)

COLD TOLD
COLD TOLD

Reputation: 13599

I think you can try to get the parent node of the current node which is <embed> then replace the child node of the parent which is <embed>

var newContent = GetContent(embedNode.Attributes["src"].Value);
var ParentNodeT =embedNode.ParentNode;
var newNodeTtext = "<p>"+newContent+"</p>";
var newNodeT = HtmlNode.CreateNode(newNodeStr);
ParentNodeT.ReplaceChild(newNodeT, embedNode);

Upvotes: 2

Junior
Junior

Reputation: 12002

I figured it out. Thanks to @COlD TOLD he advised me to convert enumerable to list

Here is what I have done.

    protected string ParseContent(string content)
    {
        if (content != null)
        {
            //Create a new document parser object
            HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();

            //load the content
            document.LoadHtml(content);

            //Get all embed tags
            List<HtmlNode> embedNodes = document.DocumentNode.Descendants("embed").ToList();

            //Make sure the content contains at least one <embed> tag
            if (embedNodes.Count() > 0)
            {
                // Outputs the href for external links
                foreach (HtmlNode embedNode in embedNodes)
                {
                    //Mak sure there is a source
                    if (embedNode.Attributes.Contains("src"))
                    {

                        if (embedNode.Attributes["src"].Value.EndsWith(".html"))
                        {
                            //At this point we know that the source of the embed tag is set and it is an html file


                            //Get the full path
                            string embedPath = customBase + embedNode.Attributes["src"].Value;

                            //Get the 
                            string newContent = GetContent(embedPath);

                            if (newContent != null)
                            {
                                //Create place holder div node
                                HtmlNode newNode = document.CreateElement("div");

                                //At this point we know the file exists, load it's content
                                newNode.InnerHtml = HtmlDocument.HtmlEncode(newContent);

                                //Here I need to be able to replace the entireembedNode with the newContent
                                document.DocumentNode.InsertAfter(newNode, embedNode);

                                //Remove the code after converting it
                                embedNode.Remove();
                            }
                        }

                    }
                }

                return document.DocumentNode.OuterHtml;
            }

            return content;
        }

        return null;
    }

Upvotes: 2

Related Questions