pauljwilliams
pauljwilliams

Reputation: 19225

Building a topic hierarchy for indexing content

Im looking to build a topic map to catagorize content.

For example the Topic 'Art' may have sub categories of 'Art History', 'Painting', 'Sculpture' etc etc.

I've crawled a few online resources, but I've hit a problem related to how I wish to use the hierarchy.

I've got a lot of content that I wish to index by topic. So to give the above example, if a user searches for 'Art' then they will not only get anything that mentions 'Art', but also anything that mentions 'Painting', even if it doesnt mention 'Art'. Fair enough.

But if, in another part of my heirarchy, I have 'House Maintenance', for example, then that might also have a subtopic of 'Painting'.

But then if a user searches for 'Art', my engine will say 'well, Painting is a sub category of 'Art', so I'll include this peice of content thats all about the best colour to paint your bathroom walls....

Has anyone come across this problem before? I've tried googling, but without knowing the exact terminology its hard to make headway....

EDIT: More succinctly, 'Painting' is a subtopic of 'Art', but if something is about 'Painting' then it doesnt neecssarily follow that its about 'Art', since 'Art' is not the only parent of 'Painting'.

Upvotes: 5

Views: 215

Answers (6)

mins
mins

Reputation: 7514

In "topic maps", as it is understood in the related standard you can set different "scopes" to a topic. So "painting" may be part of two scopes, with different meanings.

A topic map: http://www.ontopia.net/page.jsp?id=vizigator

Scope: http://www.ontopia.net/topicmaps/materials/tao.html#stp-scope

Upvotes: 3

David
David

Reputation: 1841

Turning up late to this party (you've probably already built it or moved on or found an answer) but thought I'd throw in my 2 cents having worked on a high end Topic Map based CMS.

What you are missing out in your description is how topics are linked together. Topic are linked together via Associations that in themselves have Type's and Roles. So yes painting would be a child of art and of house maintenance but they would be linked differently.

Defining your type and role is up to you really, there is no hard and fast rules its really just down to your own leanings. So

Topic: Art

Association: Source=Art, Reference=Painitng, Type=Culture, Role=Practice

Topic: House Maintenance

Association: Soruce=House Maintenance, Reference=Painting, Type=DIY, Role=Activity

I suck at categorisation but hopefully you can see what I'm getting at. You'd filter your searches based on the type and role. So if someone searched for art you'd return painting and if you wanted to dig deeper and return co-related topics you are talking about returning Culture associated topics and not DIY associated topics.

Topic Maps if done right are extremely flexible, you've also got scope and language baked in too if you do it right. You should be able to link the same topics together in a 100 different ways and see the data differently depending on your starting point.

Upvotes: 2

dafmetal
dafmetal

Reputation: 785

If the Topic Map you are creating is built on Topic Maps technology, then subjectIdentifiers can be used to distinguish between two Topics with the same name (both named "Painting") that actually represent two different Subjects (Painting as an Art form, and Painting in the sense of home renovation).

If someone queries about Art and you drill down to Painting, then you can return only those entries related to 'Painting as an Art form' because those Painting entries are no longer thrown together on one heap.

Upvotes: 2

Tony Collen
Tony Collen

Reputation: 111

Information Architecture for the World Wide Web would give you a good start on organizing information... it's a good read, but might not be so technically detailed.

Upvotes: 1

Chris Tonkinson
Chris Tonkinson

Reputation: 14469

Since you want to process House/Painting and Art/Painting differently, then it seems like you'll need two distinct entries for Painting (one for each meaning). Which one you associate a given 'lump of text' with could be based on context clues from the text itself, if your text processor is powerful enough.

For example, whenever you have a conflict like this, look in the text - do you see other words there? Like 'sink', 'wall', 'hard wood', or 'windows'? Or do you see other terms like 'Monet', 'impressionism', 'canvas', and 'gallery'? That'll allow you to automate the decision, and should be fairly accurate. The only snag is that this presumes you have a fairly healthy dictionary of 'related terms' lying around somewhere.

On the user-end, when Painting is selected, you'd simply have to either merge all the results together, or present the user an option to select which parent topic they want to be viewing results from.

Upvotes: 0

chaos
chaos

Reputation: 124325

I don't know of a specific name for that, but I don't think it should really be a problem, either. All it calls for is that Art/Painting and House Maintenance/Painting are understood as separate entities. Someone searching for "art" gets subcategories of Art, so gets Art/Painting. Someone searching for "house maintenance" gets subcategories of House Maintenance, so gets House Maintenance/Painting. Someone searching for "painting" gets Art/Painting and House Maintenance/Painting, which is appropriate.

Upvotes: 0

Related Questions