Reputation: 2757
In my country many websites get censored and blocked and are instead redirected to a certain page. I don't know how they exactly do this. But is it possible to programmatically determine if a website is blocked or is not blocked without loading the whole thing? The reason I want to do this is to use a web search API (maybe Google's) and make a web service that only brings results that are not blocked.
If it helps, the content of the page that loads instead is like this:
<html><head><meta http-equiv="Content-Type" content="text/html; charset=windows-1256"><title>M3-5
</title></head><body><iframe src="http://10.10.34.34?type=Invalid Site&policy=MainPolicy " style="width: 100%; height: 100%" scrolling="no" marginwidth="0" marginheight="0" frameborder="0" vspace="0" hspace="0"></iframe></body></html>
I understand that the server running the service would have to have my country's IP.
It would be best if there would be a way to do this in PHP or Python.
I appreciate any help or idea.
Upvotes: 5
Views: 1041
Reputation: 9709
This should be entirely possible in both PHP and Python. Basically, you want to issue an http request to the site in question and then analyze the response that you get back. Take a look at PHP's cURL or Python's Pycurl to sort out how to issue those http requests and how to process the responses.
Once you have the response back, it's just a matter of matching the pattern against some reliable pattern for blocked sites. My first guess is that you could look for <body><iframe src="http://10.10.34.34?type=Invalid Site
within the response, however you would have to check more blocked pages to see if they all include that string. You might end up needing to broaden the check to a specific regular expression if, for instance, the IP or type querystring param varies from one blocked site to another.
Upvotes: 3