David
David

Reputation: 75

HtmlAgilityPack search website for string of an array

I am writing a small program that searches different websites for certain words. If the specific word is not or no longer available I want to have an error message.

I want to keep the code relatively compact and therefore use arrays for the URLs and the words.

Unfortunately it seems that you can only search single strings:

string checkWord = doc[0].DocumentNode.SelectSingleNode("//*[text()[contains(., 'Word1')]]").InnerText;

// (= no error)

But I want to have the whole command in a loop and use an array of all words instead of 'Word1', so that every website is automatically searched for the respective word: Unfortunately it seems that you can only search single strings:

string checkWord = doc[i].DocumentNode.SelectSingleNode("//*[text()[contains(., 
        word[i])]]").InnerText;

// (= error)

Does anyone know how I can enter a variable (array) in the string instead of a specific text?

I hope I was able to explain my problem in an understandable way and there is someone who can help me :)

Ps. the whole script would be something like:

HtmlWeb web = new HtmlWeb();

string[] words = new string[] {"word1", "word2", "word3"};
HtmlDocument[] doc = new HtmlDocument[] {web.Load("www.url1.com"), web.Load("www.url2.com"), web.Load("www.url3.com"),};


for (int i = 0; i < doc.Length; i++)
{
    try()
    {
        string checkWord = doc[i].DocumentNode.SelectSingleNode("//*[text()[contains(., 
        words[i])]]").InnerText;
    }
    catch(Exception)
    {
        Console.WriteLine("Word {0} is not avaiable", i);
        continue;
    }
}

Upvotes: 1

Views: 282

Answers (1)

jessehouwing
jessehouwing

Reputation: 115047

Probably easier to just use SelectNodes("//text()") to grab all the text nodes and then a LINQ statement back in C# land to do the contains.

For example, this code will return all the words that exist on the loaded page:

string[] words = new string[] { "jesse", "jessehouwing", "word3" };
var web = new HtmlWeb();
HtmlDocument[] doc = new HtmlDocument[] { web.Load("https://jessehouwing.net") };


for (int i = 0; i < doc.Length; i++)
{
    var check = doc[i].DocumentNode.SelectNodes("//text()")
        .SelectMany(node => words.Where(word => node.InnerText.Contains(word, StringComparison.CurrentCultureIgnoreCase)))
        .Distinct();
}

Results:

Showing the 2 matching words on the loaded page.

Upvotes: 2

Related Questions