Building a geolocation photo index - crawling the web or relying on an existing API?

I'm developing a geo-location service which requires a photo per POI, and I'm trying to figure out how to match the right photo to a given location.

I'm looking for an image that will give an overview for the location rather than some arbitrary image from a given coordinate.

for example when searching for "nyc" in Google you get the following image, filtered out from http://www.filmsofcrawford.com/talesofnyctours/

enter image description here

Of course Google is Google, however I've found this similar approach on other sites , for example : https://roadtrippers.com/us/san-francisco-ca/attractions/conservatory-of-flowers?lat=37.81169&lng=-122.69478&z=11&a2=p!5

Q: For an index like [POI NAME] -> [Overview image URL], what would be your approach, (crawling, an API etc') ?

Please add your thoughts :)

Upvotes: 12

Answers (3)

vorillaz

Reputation: 6276

I would highly suggest using an existing API. Matching images with locations is quite hard to achieve. To my point of view the Google Images search API gives too many irrelevant results. It's built that way, processing images based on metatags or bringing up results ranked by SEO ranking.

If you still considering building up a web crawler take a look at Scrapy , it's open source, well documented and pretty stable.

You should take a look at other open APIs providing a location based queries. Some examples are following:

FourSquare has a great API, you may fetch your results providing each city as an endpoint.
Instagram uses the FourSquare API to map images with locations.It's popularity should be considered.
Flickr has well curated image results. You should also give it a try as you may index images based on what licence are you seeking for.
Google Places provides an API too, I have never worked with this service but I thought I had to add it to my list.

Upvotes: 3

heartyporridge

Reputation: 1201

Writing your own image crawler would not be an easy task. What happens if your target sites change their format, terms of use, or take down links, or even replace an image altogether? There's a great answer on Quora regarding the complexity of web crawlers, and even if you simplify things by narrowing down your sources to a small list of sites, you'll have to figure out how to process images, not text, and that might entail having to save hundreds of images locally for processing, which won't be fun to maintain.

I would strongly suggest leveraging Google's image search API to do the heavy 'technical lifting' for you. Your job is then to find the right combination of filters that will get you the best results. Here are some to consider:

Keywords. You could try and search by location (coordinates), but then you would have to rely on the accuracy of image metadata. Instead, how about generalizing the location of coordinates and doing a lookup based on the relative location instead? For example, you could generalize (40.812694, -74.074177) as the New York Giants stadium rather than a generic skyline of New York .
Resolution. It's safe to assume higher resolution pictures are more likely to be overview shots and taken with professional equipment. You can also consider the aspect ratio: images taller than they are wide tend to focus on a single object of interest, while images wider than they are tall tend to have more variety.
Licensing. Google's image search is capable of filtering by license and can ensure (for the most part) that you can reuse the images it finds.

Upvotes: 3

hmak

Reputation: 3978

Of course you don't need to crawl the web for this. You can use an API from google to search for images and retrieve the image. Take a look at this article

Upvotes: 0

Building a geolocation photo index - crawling the web or relying on an existing API?

Answers (3)

Related Questions