vrghost
vrghost

Reputation: 1224

node express routing decision based on user-agent

Trying to figure out a way of supplying better data to social media (open graph data). Basically, when facebook, twitter or pinetrest asks for information about a link on my page, I want to provide them og information dependent on link instead of sending them the empty page (OK, it sends javascripts that they dont run).

I tried using prerender and similar, but cant get that to run propperly. But I also realised that I would rather get the express router to identify it and service a static page based on the request.

As a first step, I need to get the user agent information:

So I thought I would add express-useragent, and that seems to work on my test site, but does not seem like facebooks scraper ever goes past it. I can see it tries to get a picture, but never updates the OG or the index. (code below should work as an example)

var express = require('express');
var router = express.Router();
var useragent = require('express-useragent');



//Set up log
var cfgBunyan = require('../config/bunyan')
var log = cfgBunyan.dbLogger('ROUTE')


router.use(useragent.express());
/* GET home page. */
router.get('/', function(req, res, next) {

  console.log(req.useragent);
  res.render('index');
});

router.get('/share/:service', function(req, res, next) {
  res.render('index');
});

router.get('/pages/:name', function (req,res, next){
  log.info('/pages/'+req.params.name)
  res.render('pages/'+req.params.name);
});
router.get('/modals/:name', function (req,res, next){
  res.render('modals/'+req.params.name);
});


router.get('/page/:name', function (req,res, next){
  res.render('index');
});


module.exports = router;

I can also tun the google test scraper, which gives me the following source

source: 'Mozilla/5.0 (compatible; Google-Structured-Data-Testing-Tool +https://search.google.com/structured-data/testing-tool)' }

So has anyone figured out a easy way to direct facebook and twitter to another route? Or is sitting and checking the different sources the right way?

Upvotes: 0

Views: 2194

Answers (1)

vrghost
vrghost

Reputation: 1224

OK, so I managed to figure out a potential solution. Basically, I created a function called isBot, which I call similar to how Authentication works, it will send the request to isBot, and check if. 1. ?_escaped_fragment_= is pressent in the url (Google and some others use that) 2. if the user agent is a known bot (Thanks prerender.io, borrowed your list from .htaccess for your service)

The setup is simple enough. Add (You don't have to, Rob was right) express-useragent to your router (just to be able to get info from the header)

//var useragent = require('express-useragent'); //Not needed ror used
//router.use(useragent.express()); // Thought this was required, it is not

Then in any route you want to check for bots add isBot:

router.get('/', isBot ,function(req, res, next) {

Then add the below function (it does a lot of logging using bunyan, as I want to have statistics, you can remove any line that starts log.info, it should still work, or add bunyan, or just change the lines to console.log. Its just output.

If the code decides the code isn't a bot, it just renders as normal

function isBot (req, res, next){
  var isBotTest = false;
  var botReq = "";
  var botID= ""; //Just so we know why we think it is a bot
  var knownBots = ["baiduspider", "facebookexternalhit", "twitterbot", "rogerbot", "linkedinbot","embedly|quora\ link\ preview","howyoubot","outbrain","pinterest","slackbot","vkShare","W3C_Validator"];
  log.info({http_user_agent: req.get('User-Agent')});
  //log.info({user_source: req.useragent.source}); //For debug, whats the HTTP_USER_AGENT, think this is the same
  log.info({request_url: req.url}); //For debug, we want to know if there are any options

  /* Lets start with ?_escaped_fragment_=, this seems to be a standard, if we have this is part of the request,
  it should be either a search engine or a social media site askign for open graph rich sharing info
   */
  var urlRequest=req.url
  var pos= urlRequest.search("\\?_escaped_fragment_=")

  if (pos != -1) {
    botID="ESCAPED_FRAGMENT_REQ";
    isBotTest = true; //It says its a bot, so we believe it, lest figure out if it has a request before or after
    var reqBits = urlRequest.split("?_escaped_fragment_=")
    console.log(reqBits[1].length)
    if(reqBits[1].length == 0){ //If 0 length, any request is infront
      botReq = reqBits[0];
    } else {
      botReq = reqBits[1];
    }

  } else { //OK, so it did not tell us it was a bot request, but maybe it is anyway
      var userAgent = req.get('User-Agent');
      for (var i in knownBots){
        if (userAgent.search(knownBots[i]) != -1){
          isBotTest = true;
          botReq=urlRequest;
          botID=knownBots[i];
        }
      }
  }


  if (isBotTest == true) {
    log.info({botID: botID, botReq: botReq});
    //send something to bots
  } else {
    log.info("We don't think this is one of those bots any more")
    return next();
  }

}

Oh, and currently it does not respond to the bot requests. If you want to do that, just add a res.render or res.send at the line that says //send something to bots

Upvotes: 1

Related Questions