pavelkolodin
pavelkolodin

Reputation: 2967

How does google index web chats that load messages dynamically via XHR or WebSocket?

Why i am able to google messages in (for example) gitter.im? How did google indexed all this: https://gitter.im/neoclide/coc.nvim?at=5ea00cdda3612210839689f1 ?

Does gitter.im return its content to google in another format or via some specific interface/protocol declared in special section for web crawlers somewhere? Did google spent some resources on development to build a gitter.im-specific crawler that is able to do specific XHR-requests?

Upvotes: 0

Views: 55

Answers (1)

pavelkolodin
pavelkolodin

Reputation: 2967

Simple:

  1. Google ask https://gitter.im/gitter/developers
  2. There is N recent messages embedded in HTML already, say 50. Then google just extract all the links from the HTML (from that time-tag "18:15", for example). Each time-tag gives you url of form https://gitter.im/gitter/developers?at=610011abc9f8852a970e808e and google doesnt care why. Just remember urls.
  3. Google asks that grabbed 50 urls of form https://gitter.im/gitter/developers?at=610011abc9f8852a970e808e
  4. Each such URL gives you ~50 messages around that exact message. So search engine think: "ok, this URL gives you THIS text".
  5. So when you search THIS test it just gives you the url closer-to that text or maybe just any url with that text...

Upvotes: 0

Related Questions