Shah Rukh
Shah Rukh

Reputation: 227

how to get all the divs from a html with the same class in html agility pack

i am getting all divs having same class name, but when i insert them into database and use a data table to handle keys and i use query select top 1 ID,Link from myTable it returns all the records, means there is something wrong with my code, do we need to use algorithms or patterns to extract the data we need or we can access directly after having that data from htmldocument. i have all the Html of a website and i am selecting 52 divs having same class from that but when i select them from my database into datatable it insert them again and again and if i use Top 1 query. it returns me all the records. what should i do? for example here is my code of getting sub categories. i am selecting block 1 having records related my divs and have data of other divs too. .

doc.LoadHtml(Result);
 HtmlNodeCollection categorynode = null;

            categorynode = doc.DocumentNode.SelectNodes("//div[@class='block1']"); //| //div[@class='column even'] | //div[@class='column odd'] ");//"//div[@class='drop-menu']//a[@href]"
            if (categorynode != null)
            {
                foreach (HtmlNode Node in categorynode)
                {

                    string Html = Node.InnerHtml;
                    if (Html != null)
                    {

                        HtmlDocument Node2 = new HtmlDocument();
                        Node2.LoadHtml(Html);
                        foreach (HtmlNode subbnode in Node2.DocumentNode.SelectNodes("//div[@class='itemMenu level1']/a"))
                        {
                            string attt = subbnode.InnerText;
                            attt = attt.Replace("&amp", "&").Replace("&;", "&");
                            HtmlAttribute att = subbnode.Attributes["href"];
                            Regex r = new Regex(@"<a.*?href=(""|')(?<href>.*?)(""|').*?>(?<value>.*?)</a>");
                            //var regex=(new Regex(@"(?<=[\?&]id=)\d+(?=\&|\#|$)").Match(att.Value).Value);

                            string links = att.Value;
                            ModelClass _ms = new ModelClass();
                            _ms.link = links;
                            _ms.Name = attt;
                            _ms.CID = 0;
                            _ms.Type = "Sub Categories";
                            Controller cc = new Controller();
                            cc.InsertCategories(_ms);
                            Console.WriteLine("Sub Categories >>> " + _ms.Name);
                        }
                    }
                }

i would have to use algorithm or tree to get only records that i want? guide me,

Upvotes: 1

Views: 692

Answers (1)

Muhammad Mateen
Muhammad Mateen

Reputation: 138

use // div in your selection the double slash will select the divs from the whole source having same name as your target div and using a dot .// before slashes will get only the current div

Upvotes: 1

Related Questions