Finding city names in string

Question

I have a list of strings (sentences) that might contain one or more Dutch city names. I also have a list of Dutch cities, and their various spellings. I am currently working in Python, but a solution in another language would also work.

What would be the best and most efficient way to retrieve a list of cities mentioned in the sentences?

What I do at the moment is loop through the sentence list, and then within that loop, loop through the cities list and one by one check if place_name in sentence.lower(), so I have:

for sentence in sentences:
    for place_name in place_names:
        if place_name in sentence.lower():
            places[place_name] = places[place_name] + 1

Is this the most efficient way to do this? I also run into the problem that cities like "Ee" exist in Holland, and that words with "ee" in them are quite common. For now I solved this by just checking if place_name + ' ' in sentence.lower(), but this is of course suboptimal and ugly, as it would also disregard sentences like "Huis in Amsterdam", since it doesn't end with a space, and it won't also work well with punctuation. I tried using regex, but this is of course way too slow. Would there be a better way to solve this particular problem, or to solve this problem in general? I am leaning somewhat to an NLP solution, but I also feel like that would be a massive overkill.

Finding city names in string

Answers (1)

Related Questions