ML_Engine
ML_Engine

Reputation: 1185

Applying function to pandas dataframe

I have a pandas dataframe called 'tourdata' consisting of 676k rows of data. Two of the columns are latitude and longitude.

Using the reverse_geocode package I want to convert these coordinates to a country data.

When I call :

import reverse_geocode as rg

tourdata['Country'] = rg.search((row[tourdata['latitude']],row[tourdata['longitude']]))

I get the error :

ValueErrorTraceback (most recent call last) in () 1 coordinates = (tourdata['latitude'],tourdata['longitude']), ----> 2 tourdata['Country'] = rg.search((row[tourdata['latitude']],row[tourdata['longitude']]))

~/anaconda/envs/py3/lib/python3.6/site-packages/reverse_geocode/init.py in search(coordinates) 114 """ 115 gd = GeocodeData() --> 116 return gd.query(coordinates) 117 118

~/anaconda/envs/py3/lib/python3.6/site-packages/reverse_geocode/init.py in query(self, coordinates) 46 except ValueError as e: 47 logging.info('Unable to parse coordinates: {}'.format(coordinates)) ---> 48 raise e 49 else: 50 results = [self.locations[index] for index in indices]

~/anaconda/envs/py3/lib/python3.6/site-packages/reverse_geocode/init.py in query(self, coordinates) 43 """ 44 try: ---> 45 distances, indices = self.tree.query(coordinates, k=1) 46 except ValueError as e: 47 logging.info('Unable to parse coordinates: {}'.format(coordinates))

ckdtree.pyx in scipy.spatial.ckdtree.cKDTree.query()

ValueError: x must consist of vectors of length 2 but has shape (2, 676701)

To test that the package is working :

coordinates = (tourdata['latitude'][0],tourdata['longitude'][0]),
results = (rg.search(coordinates))
print(results)

Outputs :

[{'country_code': 'AT', 'city': 'Wartmannstetten', 'country': 'Austria'}]

Any help with this appreciated. Ideally I'd like to access the resulting dictionary and apply only the country code to the Country column.

Upvotes: 1

Views: 623

Answers (1)

Barthelemy Pavy
Barthelemy Pavy

Reputation: 570

The search method expects a list of coordinates. To obtain a single data point you can use "get" method.

Try :

tourdata['country'] = tourdata.apply(lambda x: rg.get((x['latitude'], x['longitude'])), axis=1)

It works fine for me :

import pandas as pd
tourdata = pd.DataFrame({'latitude':[0.3, 2, 0.6], 'longitude':[12, 5, 0.8]})
tourdata['country'] = tourdata.apply(lambda x: rg.get((x['latitude'], x['longitude'])), axis=1)
tourdata['country']

Output :

0    {'country': 'Gabon', 'city': 'Booué', 'country...
1    {'country': 'Sao Tome and Principe', 'city': '...
2    {'country': 'Ghana', 'city': 'Mumford', 'count...
Name: country, dtype: object

Upvotes: 1

Related Questions