Reputation: 3391
I'm trying to identify tags of a given URL.
Is there any convention for tag specification? Any heuristic that is based on common usages?
I'm referring to in-site tags that categorize it's content. e.g. in each TC article you can find in the end a 'tags' section. Same goes for most content sites.
Upvotes: 0
Views: 76
Reputation: 1234
I hope I understood your question. I believe you are referring to tags like 'html' 'regex' and so on like at the end of your question.
In theory, you could assume, that pages use the rel="xyz"
for tag links.
Stackoverflow does it, and a few other sites I know do it too.
http://microformats.org/wiki/rel-tag
But I don't think its very reliable. As there is no 'must' and such tags are not guaranteed.
Anyhow, if you want to try it and parse the content, I would not suggest doing it from scratch. Jsoup for example provides a lot of functionality in a very slick library. You can even find link tags which have specific attributes with it.
Upvotes: 1