Reputation: 3
I'm trying to extract geo-tagged photos using by python from Flickr API. But, it returns duplicate photos. when it extracting over 41 pages, returns same photo URL. Here is my code,
# !/usr/bin/python
# coding=utf-8
from flickrapi import FlickrAPI
import json, time, os
import pymongo
client = pymongo.MongoClient("localhost",27017)
db = client.flickr
coll = db.flickr_a
API_KEY = "xxx"
SEACRET_KEY = "xxx"
flickr = FlickrAPI(API_KEY, SEACRET_KEY, format="parsed-json")
extras="url_c,url_l,url_o,geo,date_taken,owner_name"
for page in xrange(1,550):
disney = flickr.photos.search(bbox="139.867,35.613,139.914,35.645",
per_page=100,extras=extras,page=page)
photos = disney["photos"]
coll.insert(photos)
Please give me advice or sample code. Thanks.
Upvotes: 0
Views: 321
Reputation: 8352
A quick fix would be to store the photo urls in a python list and remove duplicates by turning it into a set.
at the beginning
coll = []
to add
coll.append(photos)
and at the end (I'm guessing your insert command here)
for p in set(coll):
db.flickr_a.insert(p)
Upvotes: 1