simplfuzz
simplfuzz

Reputation: 12905

Self learning solution for extracting multiple values from given text

Let's say, Message1 = your bill of amount 121.0 is due on 15 Feb., Similarly Message2 = bill amt 234.0 due on 11 Jun and so on. I want to extract bill amount and due date from similar messages. One way is to write a regular expression for every possible format. But that won't be able to handle new formats.

What is the Machine Learning approach to solve this? How do I train a model and use it to extract amount, due date from newer messages?

Upvotes: 0

Views: 94

Answers (1)

Wasi Ahmad
Wasi Ahmad

Reputation: 37741

To better answer your question, I need to know how the training data will be provided? Will you get label for each training example? Do you want to use any advanced technique that involves deep neural networks?

For example, if you want to use sequence labeling, then you can refer Supervised Sequence Labelling with Recurrent Neural Networks by Alex Graves chapter 2 for more details. For your task, I think you can try more simple approach first.

For example, pattern mining or template-based approach should help you in this regard. Besides, parsing techniques, ex., dependency parsing can help you in this context. See the difference between dependency parsing and constituent parsing.

Finally, you can also consider well-known information extraction techniques in this scenario. See the usage of NLTK for this.

Upvotes: 1

Related Questions