varun r
varun r

Reputation: 1

is there any function or module in nlp that would find a specific paragraph headings

I have a text file . I need to identify specific paragraph headings and if true i need to extract relevant tables and paragraph wrt that heading using python. can we do this by nlp or machine learning?. if so please help me out in gathering basics as i am new to this field.I was thinking of using a rule like:

if (capitalized) and heading_length <50: return heading_text

how do i parse through the entire document and pick only the header names ? this is like automating human intervention of clicking document,scrolling to relevant subject and picking it up.

please help me out in this

Upvotes: 0

Views: 976

Answers (2)

Jie Pan - Drive Only
Jie Pan - Drive Only

Reputation: 41

I agree with lorg. Although you could use NLP, but that might just complicate the problem. This problem could be an optimization problem if performance is a concern.

Upvotes: 0

lorg
lorg

Reputation: 1170

You probably don't need NLP or machine learning to detect these headings. Figure out the rule you actually want and if indeed it is such a simple rule as the one you wrote, a regexp will be sufficient. If your text is formatted (e.g. using HTML) it might be even simpler.

If however, you can't find a rule, and your text isn't really formatted consistently, your problem will be hard to solve.

Upvotes: 1

Related Questions