Reputation: 920
I am trying to debug my website's .htaccess + robots.txt, I want to use cURL or wget to try to access files that I blocked using robots.txt or pages that should redirect to another location via .htaccess
I have the following in my robots.txt
User-agent: *
Disallow: /wp/wp-admin/
yet, I still be able to crawl it
wget
$ wget http://xxxx.com/wp/wp-admin/
SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc
--2017-08-28 07:37:05-- http://xxxx.com/wp/wp-admin/
Resolving xxxx.com... 118.127.47.249
Connecting to xxxx.com|118.127.47.249|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://xxxx.com/wp/wp-login.php?redirect_to=http%3A%2F%2Fxxxx.com%2Fwp%2Fwp-
admin%2F&reauth=1 [following]
--2017-08-28 07:37:12-- http://xxxx.com/wp/wp-login.php?redirect_to=http%3A%2F%2Fxxxx.com%2Fwp%2Fwp-admin%2F&reauth=1
Connecting to xxxx.com|118.127.47.249|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2891 (2.8K) [text/html]
Saving to: `wp-login.php@redirect_to=http%3A%2F%2Fxxxx.com%2Fwp%2Fwp-admin%2F&reauth=1'
100%[==============================================================================>] 2,891 --.-K/s in 0.1s
2017-08-28 07:37:17 (22.2 KB/s) - `wp-login.php@redirect_to=http%3A%2F%2Fxxxx.com%2Fwp%2Fwp-admin%2F&re
auth=1' saved [2891/2891]
curl
$ curl -L xxx.com/wp/wp-admin -o wp-admin.html
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1147 100 1147 0 0 107 0 0:00:10 0:00:10 --:--:-- 280
0 0 0 0 0 0 0 0 --:--:-- 0:01:37 --:--:-- 0
100 2891 100 2891 0 0 17 0 0:02:50 0:02:42 0:00:08 234
neither wget nor curl respected robots.txt Is there a way to check how my .htaccess+robots.txt? Thanks!
Upvotes: 2
Views: 5181
Reputation: 7031
robots.txt is purely for search engine bots, it is ignored by most user browsers [including wget and curl], if you want to check that your robots.txt is parseable you can use google's checker in the webmaster console, which shows any errors and issues which may exist with your robots.txt file.
Redirects using .htaccess should work with any browser, and wget should show these redirects.
Upvotes: 3