Daniel Lip
Daniel Lip

Reputation: 11321

How do i loop through two Lists to compare items in both Lists?

I have this code:

private void removeDuplicates(List<string> currentSites, List<string> visitedSites)
        {
            for (int i = 0; i < currentSites.Count; i++)
            {
                for (int x = 0; x < visitedSites.Count; x++)
                {

                }
            }                    
        }

Im getting two Lists and i need first to compare each item in one List to the items in the other List to loop over all the items in the other List and compare. If one of the items exist in the other List mark it as NULL.

I need to check that visitedSites are in the currentSites to take one item move over all the Lists to check if exit if it is to mark as null.

In any case i need to use two loop's one ine the other one.

When i find its null to mark it null and after it make break;

Then i need to add another loop FOR to move over the List currentSites if im not wrong and remove all the marked NULL items.

The idea is to compare the Lists by mark the duplicated items as null then to remove all the null's.

This is the code from the beginning:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;
using HtmlAgilityPack;
using System.IO;
using System.Text.RegularExpressions;
using System.Xml.Linq;
using System.Net;
using System.Web;


namespace GatherLinks
{
    public partial class Form1 : Form
    {
        List<string> currentCrawlingSite;
        List<string> sitesToCrawl;
        int actual_sites;
        BackgroundWorker worker;
        int sites = 0;
        int y = 0;
        string guys = "http://www.google.com";

        public Form1()
        {
            InitializeComponent();

            currentCrawlingSite = new List<string>();
            sitesToCrawl = new List<string>();
            actual_sites = 0;
                    }

        private void Form1_Load(object sender, EventArgs e)
        {

        }


        private List<string> getLinks(HtmlAgilityPack.HtmlDocument document)
        {

            List<string> mainLinks = new List<string>();
            var linkNodes = document.DocumentNode.SelectNodes("//a[@href]");
            if (linkNodes != null)
            {
                foreach (HtmlNode link in linkNodes)
                {
                    var href = link.Attributes["href"].Value;
                    mainLinks.Add(href);
                }
            }
            return mainLinks;

        }


        private List<string> webCrawler(string url, int levels , DoWorkEventArgs eve)
        {
                HtmlAgilityPack.HtmlDocument doc;
                HtmlWeb hw = new HtmlWeb();
                List<string> webSites;// = new List<string>();
                List<string> csFiles = new List<string>();

                csFiles.Add("temp string to know that something is happening in level = " + levels.ToString());
                csFiles.Add("current site name in this level is : " + url);
                                try
                {
                    doc = hw.Load(url);
                    currentCrawlingSite.Add(url);
                    webSites = getLinks(doc);
                    removeDuplicates(currentCrawlingSite, webSites);
                    removeDuplicates(currentCrawlingSite, sitesToCrawl);
                    sitesToCrawl = webSites;



                    if (levels == 0)
                    {
                        return csFiles;
                    }
                    else
                    {


                        for (int i = 0; i < webSites.Count() && i < 20; i++)                         {
                            int mx = Math.Min(webSites.Count(), 20);

                            if ((worker.CancellationPending == true))
                            {
                                eve.Cancel = true;
                                break;
                            }
                            else
                            {

                                string t = webSites[i];
                                                                if ((t.StartsWith("http://") == true) || (t.StartsWith("https://") == true)) 
                                {

                                        actual_sites++;
                                        csFiles.AddRange(webCrawler(t, levels - 1,eve));
                                        this.Invoke(new MethodInvoker(delegate { Texts(richTextBox1, "Level Number " + levels + " " + t + Environment.NewLine, Color.Red); }));
                                        worker.ReportProgress(Math.Min((int)((double)i / mx * 100),100));



                                }
                            }
                        }

                        return csFiles;
                    }



                }
                catch
                {
                    return csFiles;
                }

        }

So im calling the removeDuplicated function twice need to do in the removeDuplicated the things i wrote above then im not sure if to do sitesToCrawl = webSites; or ot add somehow the links in webSites to the sitesToCrawl. The idea is when i loop over the webSites that there will be no duplicated items when adding to the csFiles List.

Upvotes: 0

Views: 294

Answers (1)

Alessandro
Alessandro

Reputation: 3761

Not sure if I understand your problem:

IEnumerable<string> notVisitedSites = currentSites.Except(visitedSites);

Upvotes: 2

Related Questions