user658182
user658182

Reputation: 2360

From a development perspective, how does does the indeed.com URL structure and site work?

On the webmaster's Q and A site, I asked the following:

https://webmasters.stackexchange.com/questions/42730/how-does-indeed-com-make-it-to-the-top-of-every-single-search-for-every-single-c

But, I would like a little more information about this from a development perspective.

If you search Google for anything job related, for example, Gastonia Jobs (City + jobs), then, in addition to their search results dominating the first page of Google, you get a URL structure back that looks like this:

indeed.com/l-Gastonia,-NC-jobs.html

I am assumming that the L stands for location in the URL structure. If you do a search for an industry related job, or a job with a specific company name, you will get back something like the following (Microsoft jobs):

indeed.com/q-Microsoft-jobs.html

With just over 40,000 cities in the USA I thought, ok, maybe it's possible they looped through them and created a page for every single one. That would not be hard for a computer. But then obviously the site is dynamic as each of those pages has 10000s of results and paginated by 10. The q above obviously stands for query. The locations I can understand, but they cannot possibly have created a web page for every single query combination, could they?

Ok, it gets a tad weirder. I wanted to see if they had a sitemap, so I typed into Google "indeed.com sitemap.xml" I got the response:

indeed.com/q-Sitemap-xml-jobs.html

.. again, I searched for "indeed.com url structure" and, as I mentioned in the other post on webmasters, I got back:

indeed.com/q-change-url-structure-l-Arkansas.html

Is indeed.com somehow using programming to create a webpage on the fly based on my search input into google? If they are not, how are they able to have a static page for millions and millions and millions possible query combinations, have them dynamically paginate, and then have all of those dominate google's first page of results (albeit that very last question may be best for the webmasters QA)?

Does the javascript in the page somehow interact with the URL

Upvotes: 3

Views: 2283

Answers (4)

Paris Vega
Paris Vega

Reputation: 31

They also make clever use of rel="canonical" and thorough internal linking: http://www.indeed.com/find-jobs.jsp

Notice that all the pages that actually rank can be found from that direct internal link structure.

Upvotes: 0

Mannie Singh
Mannie Singh

Reputation: 129

Easy when Googles search bot crawls the pages on indeed or any other job search site those page are dynamically created. Here is another site: http://jobuzu.co.uk i run this which is similar to how indeed works.

PHP is your friend in this and Indeed don't just use standard databases look into Sphinx and Solr as they offer Full text search for better performance then MySql etc.

Upvotes: 0

user2157846
user2157846

Reputation: 11

This is a great question however remains unanswered on the ground that a basic Google search using,

ste:indeed.com

returns over 120MM results and secondly a query such as, "product manager new york" ranks #1 in results. These pages are obviously pre-generated which is confirmed by the fact the page is cached by the search engine (sometimes several days before) has different results from a live query on the site.

Upvotes: 1

InanisAtheos
InanisAtheos

Reputation: 632

It's most likely not a bunch of pages. The "actual" page might be http://indeed.com/?referrer=google&searchterm=jobs%20in%20washington. The site then cleverly produces a human readable URL using URL rewrite, fetches jobs in the database that matches the query, and voíla...

I could be dead wrong of course. Truth be told, the technical aspect of it can probably be solved in a multitude of ways. Every time a job is added to the site, all pages that need to be done to match that job, might be created, thus producing an enormous amount of pages for Google to crawl.

Upvotes: 1

Related Questions