Reputation: 25387
I am intending to user Angular Universal for server side rendering (SSR) but this should only be done for crawlers and bots from selected search engines.
What I want is the following schema:
source: https://dingyuliang.me/use-prerender-improve-angularjs-seo/
After following the official instructions to set up SSR I can now validate that Googlebot (finally) "sees" my website and should be able to index it.
However, at the moment all requests are rendered on the server. Is there a way to determine whether incoming requests are coming from search engines and pre-render the site only for them?
Upvotes: 13
Views: 2544
Reputation: 439
Just managed what you wanted but did not find any anwser providing a detailed step by step with Angular Universal and Express server. So I post here my solution, any idea of improvement welcomed !
First add this function to the server.ts
function isBot(req: any): boolean {
let botDetected = false;
const userAgent = req.headers['user-agent'];
if (userAgent) {
if (userAgent.includes("Googlebot") ||
userAgent.includes("Bingbot") ||
userAgent.includes("WhatsApp") ||
userAgent.includes("facebook") ||
userAgent.includes("Twitterbot")
) {
console.log('bot detected with includes ' + userAgent);
return true;
}
const crawlers = require('crawler-user-agents');
crawlers.every(entry => {
if (RegExp(entry.pattern).test(userAgent)) {
console.log('bot detected with crawler-user-agents ' + userAgent);
botDetected = true;
return false;
}
return true;
})
if (!botDetected) console.log('bot NOT detected ' + userAgent);
return botDetected;
} else {
console.log('No user agent in request');
return true;
}
}
this function uses 2 modes to detect crawlers (and asumes that the absence of user-agent means that the request is from a bot), the first is a 'simple' manual detection of a string within the header's user-agent and secondly a more advanced detection based on the package 'crawler-user-agents' that you can install to your Angular project like this :
npm install --save crawler-user-agents
Second, once this function added to your server.ts, just use it in each
server.get(`/whatever`, (req: express.Request, res: express.Response) => {
}
of your Express server export function, for which the 'whatever' route should have a different behaviour based on Bot detection.
Your 'server.get()' functions become :
server.get(`/whatever`, (req: express.Request, res: express.Response) => {
if (!isBot(req)) {
// here if bot is not detected we just return the index.hmtl for CSR
res.sendFile(join(distFolder + '/index.html'));
return;
}
// otherwise we prerend
res.render(indexHtml, {
req, providers: [
{ provide: REQUEST, useValue: req }
]
});
});
To further improve the server load for SEO when a bot is requesting a page I also implemented 'node-cache' because in my case SEO bots do not need the very lastest version of each page, for this I found a good answer here : #61939272
Upvotes: 0
Reputation: 4220
This is what I came up with IIS:
In order to get rid of complex folder structures, change the following line in server.ts
const distFolder = join(process.cwd(), 'dist/<Your Project>/browser');
to this:
const distFolder = process.cwd();
npm run build:ssr
command. You will end up with the browser
and server
folders inside the dist
folder.Create a folder for hosting in IIS and copy the files that are in the browser
and server
folders in to the created folder.
iis\
-assets\
-favicon.ico
-index.html
-main.js => this is the server file
-main-es2015.[...].js
-polyfills-es2015.[...].js
-runtime-es2015.[...].js
-scripts.[...].js
-...
Add a new file to this folder named web.config
with this content:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<system.webServer>
<rewrite>
<rules>
<rule name="Angular Routes" stopProcessing="true">
<match url=".*" />
<conditions logicalGrouping="MatchAll">
<add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
<add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
<add input="{HTTP_USER_AGENT}" pattern="(.*[Gg]ooglebot.*)|(.*[Bb]ingbot.*)" negate="true" />
</conditions>
<action type="Rewrite" url="/index.html" />
</rule>
<rule name="ReverseProxyInboundRule1" stopProcessing="true">
<match url=".*" />
<conditions>
<add input="{HTTP_USER_AGENT}" pattern="(.*[Gg]ooglebot.*)|(.*[Bb]ingbot.*)" />
</conditions>
<action type="Rewrite" url="http://localhost:4000/{R:0}" />
</rule>
</rules>
</rewrite>
<directoryBrowse enabled="false" />
</system.webServer>
</configuration>
Inside this folder open a Command Prompt or PowerShell and run the following:
> node main.js
Now you should be able to view your Server-Side Rendered website with localhost:4000
(if you haven't changed the port)
Install the IIS Rewrite Module
IIS will redirect requests that have googlebot
or bingbot
in them to localhost:4000
which is handled by Express and will return server-side rendered content.
You can test this with Google Chrome, open Developer Console, from the menu select "More tools>Network conditions". Then from the User Agent section disable "Select automatically" and choose Googlebot.
Upvotes: 1
Reputation: 576
You can achieve that with Nginx.
In Nginx you can forward the request to the universal served angular application via..
if ($http_user_agent ~* "googlebot|yahoo|bingbot") {
proxy_pass 127.0.0.1:5000;
break;
}
root /var/www/html;
..assuming that you are serving angular universal via 127.0.0.1:5000
.
In case a browser user agent comes along, we serve the page via root /var/www/html
So the complete config would be something like..
server {
listen 80 default;
server_name angular.local;
location / {
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $http_host;
if ($http_user_agent ~* "googlebot|yahoo|bingbot") {
proxy_pass 127.0.0.1:5000;
break;
}
root /var/www/html;
}
}
Upvotes: 3