Reputation: 15420
I am making share a link feature like facebook. Currently I am parsing meta tags to get keywords, descriptions e.t.c but how to parse these type of pages http://en.wikipedia.org/wiki/Wikipedia There is no meta description for this page but facebook still fetches the following description: Wikipedia ( /ˌwɪkɪˈpiːdi.ə/ or /ˌwɪkiˈpiːdi.ə/ WIK-i-PEE-dee-ə) is a free,[3]web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Its 17 million articles (over 3.4 million in English) have been written collaboratively by volunteers around the
How can I extract such description if there is no meta description tag found on the page.
Upvotes: 2
Views: 1460
Reputation: 1552
Amazon faces a similar problem, and has a fairly novel solution. Obviously, it's not perfect, but by marrying it to the idea that Bing uses, I'd bet you could get some pretty solid and interesting keyword tags auto-generated to go with the inherently more suspect description.
So it'd look like:
Description from meta
Interesting Sentences according to bing\google
STP as tags, with hover-over for context.
I think that, in all likelyhood, this is like nuking a fly.
It'd oversolve your problem to a ridiculous degree.
Upvotes: 1
Reputation: 9391
If you want to create a program that gives you a good description of an arbitrary website, you'll have to do nothing less than a full fledged KI, which would possibly even pass a Turing test. So short answer: You can't.
If you are willing to pay a human intelligence to write a summary about a webpage for you, google for "Microjobs". You can create an automated Job description like "Write a two sentence summary about webpage XY" and put some cents of value behind it.
Of course you could try to find the first paragraph of text and take the first N sentences out of it, but that will fail on a lot of websites.
Upvotes: 1
Reputation: 29976
Looks like they generate the description the same way Bing does which might be difficult to easily re-create:
How does Bing generate a description of my Web site?
The way you design your Web page content has the greatest impact on your Web page description. As MSNBot crawls your Web site, it analyzes the content on indexed Web pages and generates keywords to associate with each Web page. MSNBot extracts Web page content that is most relevant to the keywords, and constructs the Web site description that appears in search results. The Web page content is typically sentence segments that contain keywords or information in the description tag. The Web page title and URL are also extracted and appear in the search results.
If you change the contents of a Web page, your Web page description might change the next time the Bing index is updated. To influence your Web site description, make sure that your Web pages effectively deliver the information you want in the search results. Webmaster Center recommends the following strategies when you design your content:
* Place descriptive content near the top of each Web page. * Make sure that each Web page has a clear topic and purpose. * Create unique <title> tag content for each page. * Add a Web site description <meta> tag to describe the purpose of
each page on your site. For example:
> <META NAME="Description"
> CONTENT="Sample text - describe your
http://www.bing.com/toolbox/support/faqs.aspx
One option would be to hit Bing and try to fetch the description from there.
Upvotes: 2