Reputation: 525
I have csv file having some address data mostly in Finnish language. I need to read that file and getting some geocode information of these address. But It doesn't work for Finnish alphabet and says it cant read those! Can anybody please help me out of this?
import urllib,urllib2,time
addr_file = 'address.csv'
out_file = 'addresses_geocoded.csv'
out_file_failed = 'failed.csv'
sleep_time = 2
root_url = "http://maps.google.com/maps/geo?"
gkey = "asfasdfasdfasdf" # not an actual value
return_codes = {'200':'SUCCESS',
'400':'BAD REQUEST',
'500':'SERVER ERROR',
'601':'MISSING QUERY',
'602':'UNKOWN ADDRESS',
'603':'UNAVAILABLE ADDRESS',
'604':'UNKOWN DIRECTIONS',
'610':'BAD KEY',
'620':'TOO MANY QUERIES'
}
def geocode_for_musiquitous(addr_file,out_fmt='csv'):
#encode our dictionary of url parameters
values = {'q' : addr_file, 'output':out_fmt, 'key':gkey}
data = urllib.urlencode(values)
#set up our request
url = root_url+data
req = urllib2.Request(url)
#make request and read response
response = urllib2.urlopen(req)
geodat = response.read().split(',')
response.close()
# this section is just handle the data returned from google
code = return_codes[geodat[0]]
if code == 'SUCCESS':
code,precision,lat,lng = geodat
return {'code':code,'precision':precision,'lat':lat,'lng':lng}
else:
return {'code':code}
def main():
#open i/o files
outf = open(out_file,'w')
outf_failed = open(out_file_failed,'w')
inf = open(addr_file,'r')
for address in inf:
#get latitude and longitude of address
data = geocode_for_musiquitous(address)
#output results and log to file
if len(data)>1:
print "Latitude and Longitude of "+address+":"
print "\tLatitude:",data['lat']
print "\tLongitude:",data['lng']
outf.write(address.strip()+data['lat']+','+data['lng']+'\n')
outf.flush()
else:
print "Geocoding of '"+addr_file+"' failed with error code "+data['code']
outf_failed.write(address)
outf_failed.flush()
time.sleep(sleep_time)
#clean up
inf.close()
outf.close()
outf_failed.close()
if __name__ == "__main__":
main()
Upvotes: 1
Views: 845
Reputation: 26138
You need to open file using the correct encoding using the codecs module. The correct encoding for Finnish is probably ISO-8859-1
inf = codecs.open(addr_file,'r', 'iso-8859-1')
If this is not the correct encoding for your file you need to find out what the correct encoding for you file is then check whether a codec for it is available like below:
import codecs
codec = codecs.lookup("iso-8859-1'")
print codec.name
If codecs.lookup()
returns a codec object for the encoding you a looking for then it is available and can be used in codecs.open()
.
Upvotes: 0
Reputation: 70218
The argument of urllib.url should be UTF-8 encoded beforehand:
addr_file = addr_file.encode("utf-8")
values = {'q' : addr_file, 'output':out_fmt, 'key':gkey}
data = urllib.urlencode(values)
And make sure you open the CSV file with the correct encoding (might be "windows-1252" or "iso-8859-1"):
inf = codecs.open(addr_file, 'r', 'iso-8859-1')
Upvotes: 1
Reputation: 86482
Use the codecs
module.
codecs.open(filename, mode[, encoding[, errors[, buffering]]])
Open an encoded file using the given mode and return a wrapped version providing transparent encoding/decoding. The default file mode is 'r' meaning to open the file in read mode.
You can use wrapped file objects to read encoded files into unicode strings.
Upvotes: 0
Reputation: 449733
I don't know Python, but I'm pretty sure this is an encoding issue.
Make sure your address file is UTF-8 encoded and that urlencode()
function you use can deal with UTF-8 characters (the latter shouldn't be a problem though, as Python can handle UTF-8 natively as far as I know).
Upvotes: 0