Reputation: 4579
The problem that I'm facing is: I want to read a document, get the raw string of this document, and classify the information. For example, I want to identify when the string is a "Name", or a "date" ou some other useful information.
Is it possible to use machine learning to do that? How may I approach the problem?
The most hard problem here is that I'm not trying to classify the document itself, but the String information inside the document.
Upvotes: 0
Views: 87
Reputation: 1476
So it's all about how you think about your problem. I think your problem can be formulated as an entity extraction/recognition problem, where you have a document and want to identify specific entities within the text (where an entity might be a person, date, etc). Take a look at Conditional Random Fields and their applications to named entity recognition (NER for short), as there are some libraries & tools already implemented.
For example, check out StanfordNER.
Upvotes: 2