Noam
Noam

Reputation: 3391

Identifying article tags

I'm trying to identify tags of a given URL.

Is there any convention for tag specification? Any heuristic that is based on common usages?

I'm referring to in-site tags that categorize it's content. e.g. in each TC article you can find in the end a 'tags' section. Same goes for most content sites.

Upvotes: 0

Views: 76

Answers (1)

Slomo
Slomo

Reputation: 1234

I hope I understood your question. I believe you are referring to tags like 'html' 'regex' and so on like at the end of your question.

In theory, you could assume, that pages use the rel="xyz" for tag links. Stackoverflow does it, and a few other sites I know do it too.

http://microformats.org/wiki/rel-tag

But I don't think its very reliable. As there is no 'must' and such tags are not guaranteed.

Anyhow, if you want to try it and parse the content, I would not suggest doing it from scratch. Jsoup for example provides a lot of functionality in a very slick library. You can even find link tags which have specific attributes with it.

Upvotes: 1

Related Questions