Schotsl
Schotsl

Reputation: 227

Creating a simplified page for web scrapers

A client of mine has a very stylized web page with vacancies, this includes things like tables, graphs, and forms. This looks great when someone visits their website but a lot of other sites scrape the web page and show the vacancy on their site.

Previously their site was a simple page with some basic markup such as headers, paragraphs, and bold texts. The other websites picked this up just fine but with the more advanced markup, they're falling behind.

How could I make a special page for the web scrapers? To be more specific: how could I detect in PHP that a web scraper is looking at the page? From there on I could figure out how to make a custom CMS page for the client so they can use it to fill in the simple markup theirself.

Upvotes: 1

Views: 67

Answers (1)

RoToRa
RoToRa

Reputation: 38441

First off, headers, paragraph, etc. are good markup. If the "advanced markup" doesn't have these, then it's not "advanced markup" at all, but bad markup. So irregardless if your pages are being scraped you should be using semantic markup anyway. Additionally there are more ways to give the HTML meaning such as microdata.

But since you seem know the web scrapers (or at least know about them) and have given them permission (implicitly or explicitly) to crawl the site, then their operators should provide documentation, that tells you what exactly what they are looking for.

Preferably these operators shouldn't be using webscapers at all, but they should recieve the information they are looking for from you in a structured way, such as JSON or XML which you generate additionally to the regular HTML pages.

Upvotes: 3

Related Questions