Bharath M Shetty
Bharath M Shetty

Reputation: 30605

Finding a similar text present in string in python

I have a txt file containing text

Table of Contents

Preface 1

Chapter 1: Tokenizing Text and WordNet Basics 7

Tokenizing text into sentences 8

Tokenizing sentences into words 10

Tokenizing sentences using regular expressions 12

If the string I have is :

input = "Tokenzing sentence using expressions"

I thought of using beginning and ending words to extract the sentence but there are lot of repetitions.

So whats the best way to get the output

Tokenizing sentences using regular expressions

Upvotes: 1

Views: 1418

Answers (1)

BoarGules
BoarGules

Reputation: 16942

If you are prepared to preprocess your chapter headings, eliminating page numbers and stuff, this:

import difflib
contents = ["Tokenizing Text and WordNet Basics",
            "Tokenizing text into sentences",
            "Tokenizing sentences into words",
            "Tokenizing sentences using regular expressions"]
input = "Tokenzing sentence using expressions"
print (difflib.get_close_matches(input, contents, n=1))

will give you this output:

['Tokenizing sentences using regular expressions']

Upvotes: 4

Related Questions