Dharight
Dharight

Reputation: 71

Is there a way to identify cities in a text without maintaining a prior vocabulary, in Python?

I have to identify cities in a document (has only characters), I do not want to maintain an entire vocabulary as it is not a practical solution. I also do not have Azure text analytics api account.

I have already tried using Spacy, I did ner and identified geolocation and that output is passed to spellchecker() to train the model. But the issue with this is that ner requires sentences and my input has words.

I am relatively new to this field.

Upvotes: 3

Views: 704

Answers (2)

Ankur Sinha
Ankur Sinha

Reputation: 6639

You can check out the geotext library.

Working example with a sentence:

text = "The capital of Belarus is Minsk. Minsk is not so far away from Kiev or Moscow. Russians and Belarussians are nice people."

from geotext import GeoText

places = GeoText(text)
print(places.cities)

Output:

['Minsk', 'Minsk', 'Kiev', 'Moscow']

Working example with list of words:

wordList = ['London', 'cricket', 'biryani', 'Vilnius', 'Delhi']

for i in range(len(wordList)):
    places = GeoText(wordList[i])
    if places.cities:
        print(places.cities)

Output:

['London']
['Vilnius']
['Delhi']


geograpy is another alternative. However, I find geotext light due to lesser number of external dependencies.

Upvotes: 4

kederrac
kederrac

Reputation: 17322

there is a list of libraries that may help you, but from my experience, there is not a perfect library for this. If you know all the cities that may appear in the text, then vocabulary is the best thing

Upvotes: 0

Related Questions