Extracting specific information from text

Question

I'd like to get some data from text file. I've decided to do it using Natural Language Toolkit, but I'm open to suggestions if there is a better way to do this.

Here is an example:

I need a flight from New York NY to San Francisco CA.

From this text, I'd like to get city and state for origin and destination.

Here is what I have so far:

import nltk
from nltk.text import *
from nltk.corpus import PlaintextCorpusReader

def readfiles():    
    corpus_root = 'C:\prototype\emails'
    w = PlaintextCorpusReader(corpus_root, '.*')
    t = Text(w.words())
    print "--- to ----"
    print t.concordance("to")

    print "--- from ----"
    print t.concordance("from")

I can read the text from some input (file in my case) then use concordance method to find all the usages of it. I want to extract the city, state information that comes after 'to' and 'from'.

Question is what is the best way to look at text that is after the instances of 'to' and 'from'?

Extracting specific information from text

Answers (1)

Related Questions