EL_EL
EL_EL

Reputation: 133

htaccess rule to block hotlinking but allow my domain, social mediabots and crawler bots

Issue: I have 'public' folders (https://www.example.com/assets/pub/*) and I want to:-

  1. Allow social media Bots (i.e. Facebook) for image and URL sharing
  2. Allow known good crawl bots (googlebot|Googlebot-Image/1.0|AdsBot-Google-Mobile|AdsBot-Google|AdsBot-Google|bingbot|BingPreview|msnbot|yahoo)
  3. Allow access from mydomain
  4. Deny all other domains from hotlinking to (jpg|jpeg|gif|svg|png|bmp|pdf|webp) and route a deny to a 401.

Testing metodology: https://developers.facebook.com/docs/sharing/bot/

  1. https://developers.facebook.com/tools/debug/

If I get it right, it is able to detect the image in the OG metatag using the debug tool. If I get it wrong I get the below error

Provided og:image URL, https://www.example.com/assets/pub/page-sm/a0016-23.jpg could not be processed as an image because it has an invalid content type.
  1. Google bot I can test in Search console https://support.google.com/webmasters/answer/7643418

  2. I can test bingbot also https://www.bing.com/webmasters/urlinspection

I have had multiple attemps to try and create this, referencing many stackoverflow articles without success. The closest examples are shown below in the full htaccess file. However, both attempts fail for the allow facebookbot, throwing the error in the debug tool, but works for all the other conditions. note that everything else works as intended in the htaccess file but I have included it in case something else in there is messing this up.

Can anyone help me tune this?

# Enabling Browser Caching
<IfModule mod_expires.c>

 <IfModule mod_headers.c>
    Header append Cache-Control "public"
    <FilesMatch "\.(ico|flv|jpg|jpeg|png|gif|swf|webp)$">
      Header set Cache-Control "max-age=31536000, public"
    </FilesMatch>
    <FilesMatch "\.(html|htm)$">
      Header set Cache-Control "max-age=31536000, private, must-revalidate"
    </FilesMatch>
    <FilesMatch "\.(pdf)$">
      Header set Cache-Control "max-age=31536000, public"
    </FilesMatch>
    <FilesMatch "\.(js|css)$">
      Header set Cache-Control "max-age=0, private, must-revalidate"
    </FilesMatch>
  </IfModule>

</IfModule>

# Add robots tags to private assets
<If "%{REQUEST_URI} =~ m#^/assets/pvt/#">
Header set X-Robots-Tag "noindex, noarchive, nosnippet"
</If>


<IfModule mod_rewrite.c>
    Options +FollowSymLinks
    Options -Indexes
    RewriteEngine On
   
    ErrorDocument 404 /404notFound.php

    # Force browser to use https and security headers
    RewriteCond %{HTTPS} off
    RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
    #NE
    RewriteCond %{HTTP_HOST} !^www\. [NC]
    RewriteRule (.*) https://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
    RewriteRule ^(.*)/$ /$1 [L,R]
    
    Header always set Content-Security-Policy "upgrade-insecure-requests;"
    Header always set Strict-Transport-Security: "max-age=31536000" env=HTTPS
    Header always set X-Content-Type-Options "nosniff"
    Header always set X-XSS-Protection "1; mode=block"
    Header always set Expect-CT "max-age=7776000, enforce"
    Header always set Referrer-Policy: "same-origin"
    Header always set X-Frame-Options "SAMEORIGIN"
    
    # Enable cross domain access control
    SetEnvIf Origin "^http(s)?://(.+\.)?(www.example\.com)$" REQUEST_ORIGIN=$0
    Header always set Access-Control-Allow-Origin %{REQUEST_ORIGIN}e env=REQUEST_ORIGIN
    Header always set Access-Control-Allow-Methods "GET, POST, PUT, DELETE, OPTIONS"
    Header always set Access-Control-Allow-Headers "x-test-header, Origin, X-Requested-With, Content-Type, Accept"
  
    # Allow hotlinking of public folder
    # RewriteCond %{REQUEST_URI} !^/assets/pub/.*$ [NC] #This is the cond I want to replace!
    
    # ATTEMPT 1
    # Block hotlinking to public assets but allow socialmedia crawlers & known web crawlers
    #AllowList
    RewriteCond %{HTTP_USER_AGENT} !(googlebot|Googlebot-Image/1.0|AdsBot-Google-Mobile|AdsBot-Google|AdsBot-Google|bingbot|BingPreview|msnbot|yahoo|FacebookBot/1.0) [NC] #allowed
    RewriteCond %{HTTP_REFERER} !^https://(\.)?www.example.com.*/.*$ [NC]
    RewriteCond %{HTTP_REFERER} !^https://(\.)?www.example.com.*$ [NC]
    #blocklist
    RewriteCond %{HTTP_USER_AGENT} (bot|robot|crawl|krawler|spider|libwww-perl.*|-?|\ ) [NC]
    #Rewrite rule
    RewriteRule .*\.(jpg|jpeg|gif|svg|png|bmp|pdf|webp)$ /401unauthorized.php [L]

    # ATTEMPT 2
    # Block hotlinking to public assets but allow socialmedia crawlers & known good web crawlers
    RewriteCond %{HTTP_USER_AGENT} !(googlebot|Googlebot-Image/1.0|AdsBot-Google-Mobile|AdsBot-Google|AdsBot-Google|bingbot|BingPreview|msnbot|yahoo|FacebookBot/1.0) [NC]
    RewriteCond %{HTTP_REFERER} !^https://(\.)?www.example.com*/.*$ [NC]
    RewriteCond %{HTTP_REFERER} !^https://(\.)?www.example.com.*$ [NC]
    RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?facebook.com*/.*$ [NC]
    RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?facebook.com.*$ [NC]
    RewriteCond %{HTTP_REFERER} !^https://(\.)?developers.facebook.com*/.*$ [NC]
    RewriteCond %{HTTP_REFERER} !^https://(\.)?developers.facebook.com.*$ [NC]
    RewriteRule .*\.(jpg|jpeg|gif|svg|png|bmp|pdf|webp)$ /401unauthorized.php [L]
    
    # No hotlinking for pvt asset files
    RewriteCond %{HTTP_REFERER} !^https://(\.)?www.example.com/assets/pvt/.*$ [NC]
    RewriteCond %{HTTP_REFERER} !^https://(\.)?www.example.com/assets/pvt/$ [NC]
    RewriteCond %{HTTP_REFERER} !^https://(\.)?www.example.com.*/.*$ [NC]
    RewriteCond %{HTTP_REFERER} !^https://(\.)?www.example.com.*$ [NC]
    RewriteRule .*\.(jpg|jpeg|gif|svg|png|css|js|bmp|pdf|webp)$ /401unauthorized.php [L]
        
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteCond %{REQUEST_FILENAME} !-l

    ############################################
    ## rewrite everything else to index.php
    RewriteRule .* index.php [L]
</IfModule>

<IfModule mod_deflate.c>
    AddOutputFilterByType DEFLATE text/plain
    AddOutputFilterByType DEFLATE text/javascript
    AddOutputFilterByType DEFLATE text/html
    AddOutputFilterByType DEFLATE text/xml
    AddOutputFilterByType DEFLATE text/css
    AddOutputFilterByType DEFLATE text/vtt 
    AddOutputFilterByType DEFLATE text/x-component
    AddOutputFilterByType DEFLATE application/xml
    AddOutputFilterByType DEFLATE application/xhtml+xml
    AddOutputFilterByType DEFLATE application/rss+xml
    AddOutputFilterByType DEFLATE application/js
    AddOutputFilterByType DEFLATE application/javascript
    AddOutputFilterByType DEFLATE application/x-javascript
    AddOutputFilterByType DEFLATE application/x-httpd-php
    AddOutputFilterByType DEFLATE application/x-httpd-fastphp
    AddOutputFilterByType DEFLATE application/atom+xml 
    AddOutputFilterByType DEFLATE application/json
    AddOutputFilterByType DEFLATE application/ld+json 
    AddOutputFilterByType DEFLATE application/vnd.ms-fontobject 
    AddOutputFilterByType DEFLATE application/x-font-ttf 
    AddOutputFilterByType DEFLATE application/font-sfnt
    AddOutputFilterByType DEFLATE application/x-web-app-manifest+json 
    AddOutputFilterByType DEFLATE font/opentype 
    AddOutputFilterByType DEFLATE font/otf
    AddOutputFilterByType DEFLATE font/ttf
    AddOutputFilterByType DEFLATE font/sfnt
    AddOutputFilterByType DEFLATE image/svg+xml
    AddOutputFilterByType DEFLATE image/x-icon 
</IfModule> 

Upvotes: 0

Views: 51

Answers (0)

Related Questions