Reputation: 75
I am writing a small program that searches different websites for certain words. If the specific word is not or no longer available I want to have an error message.
I want to keep the code relatively compact and therefore use arrays for the URLs and the words.
Unfortunately it seems that you can only search single strings:
string checkWord = doc[0].DocumentNode.SelectSingleNode("//*[text()[contains(., 'Word1')]]").InnerText;
// (= no error)
But I want to have the whole command in a loop and use an array of all words instead of 'Word1', so that every website is automatically searched for the respective word: Unfortunately it seems that you can only search single strings:
string checkWord = doc[i].DocumentNode.SelectSingleNode("//*[text()[contains(.,
word[i])]]").InnerText;
// (= error)
Does anyone know how I can enter a variable (array) in the string instead of a specific text?
I hope I was able to explain my problem in an understandable way and there is someone who can help me :)
Ps. the whole script would be something like:
HtmlWeb web = new HtmlWeb();
string[] words = new string[] {"word1", "word2", "word3"};
HtmlDocument[] doc = new HtmlDocument[] {web.Load("www.url1.com"), web.Load("www.url2.com"), web.Load("www.url3.com"),};
for (int i = 0; i < doc.Length; i++)
{
try()
{
string checkWord = doc[i].DocumentNode.SelectSingleNode("//*[text()[contains(.,
words[i])]]").InnerText;
}
catch(Exception)
{
Console.WriteLine("Word {0} is not avaiable", i);
continue;
}
}
Upvotes: 1
Views: 282
Reputation: 115047
Probably easier to just use SelectNodes("//text()")
to grab all the text nodes and then a LINQ statement back in C# land to do the contains.
For example, this code will return all the words that exist on the loaded page:
string[] words = new string[] { "jesse", "jessehouwing", "word3" };
var web = new HtmlWeb();
HtmlDocument[] doc = new HtmlDocument[] { web.Load("https://jessehouwing.net") };
for (int i = 0; i < doc.Length; i++)
{
var check = doc[i].DocumentNode.SelectNodes("//text()")
.SelectMany(node => words.Where(word => node.InnerText.Contains(word, StringComparison.CurrentCultureIgnoreCase)))
.Distinct();
}
Results:
Upvotes: 2