user2876812
user2876812

Reputation: 336

topic modeling using keywords for topics

I need to do topic modeling in the following manner:

eg:

I need to extract 5 topics from a document.The document being a single document.I have the keywords for 5 topics and related to these 5 keywords i need to extract the topics.

The keywords for 5 topics being: keyword 1-(car,motorsport,...) keyword 2-(accident,insurance,...) ......

The corresponding output should be: Topic 1-(vehicle,torque,speed...) Topic 2-(claim,amount,....)

How could this be done?

Upvotes: 0

Views: 1392

Answers (1)

Norman H
Norman H

Reputation: 2262

A good place to start would be this LDA topic modelling library written for use with NodeJS.

https://www.npmjs.org/package/lda

var lda = require('lda');
// Example document.
var text = 'Cats are small. Dogs are big. Cats like to chase mice. Dogs like to eat bones.';

// Extract sentences.
var documents = text.match( /[^\.!\?]+[\.!\?]+/g );

// Run LDA to get terms for 2 topics (5 terms each).
var result = lda(documents, 2, 5);
The above example produces the following result with two topics (topic 1 is "cat-related", topic 2 is "dog-related"):

Topic 1
cats (0.21%)
dogs (0.19%)
small (0.1%)
mice (0.1%)
chase (0.1%)

Topic 2
dogs (0.21%)
cats (0.19%)
big (0.11%)
eat (0.1%)
bones (0.1%)

That should get you started down the path. Please note, you will likely have to play with the number of topics and documents to tune them for the amount of information you are looking to extract.

This isn't magic.

http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation

Upvotes: 1

Related Questions