jab
jab

Reputation: 5823

Best way to get word frequency counts for a website? Or part of a website?

Pretty simple, I just looking for a simple means of extracting word frequencies from a given website, or section of a website.

I am also interested in calculating average distance between two given words throughout a website. The units of distance being in words.

I am asking this question because I quite frankly haven't been able to find much information leading to the intuition of performing such a task. I don't have any experience with web spidering or scraping of any kind.

Thanks (I asked this question earlier, but it wasn't well formed)

Upvotes: 4

Views: 1458

Answers (1)

KostasT
KostasT

Reputation: 217

You could try to use Scrapy. It is quite powerful tool for scrapping websites, but may require knowledge of regular expressions and XPath. Try to follow tutorial.

Upvotes: 1

Related Questions