Reputation: 13
I have a dataframe with a name of a location in the index and 2 columns, Latitude and Longitude.
data = {'LE0039': {'LATITUDE': 59.522583, 'LONGITUDE': 29.566056},
'LE0073': {'LATITUDE': 59.287991, 'LONGITUDE': 31.369472},
'LE0142': {'LATITUDE': 59.350241, 'LONGITUDE': 32.531339},
'LE0278': {'LATITUDE': 59.96475, 'LONGITUDE': 29.19585}}
df = pd.DataFrame.from_dict(data, 'index')
LATITUDE LONGITUDE
LE0039 59.522583 29.566056
LE0073 59.287991 31.369472
LE0142 59.350241 32.531339
LE0278 59.964750 29.195850
I need to calculate the minimum distance from one site to any other and store it in a third column for each site. I want to calculate the distance matrix with scipy.spatial.distance.pdist() but in order to do it I first need a new column with (LATITUDE, LONGITUDE) in order to pass it to pdist().
So I have 2 questions. One is how to combine lat and long to have an array of (lat,long) and the other if you think there is a better way to calculate the minimum distance
Upvotes: 1
Views: 2374
Reputation: 6091
use the good old combo of list
+ zip
. zip
creates the paired object and list creates the list so it can be assigned to the dataframe
df['combined'] = list(zip(df.LATITUDE, df.LONGITUDE))
output:
LE0039 59.522583 29.566056 (59.522583, 29.566056)
LE0073 59.287991 31.369472 (59.287991000000005, 31.369472)
LE0142 59.350241 32.531339 (59.350241000000004, 32.531339)
LE0278 59.964750 29.195850 (59.96475, 29.19585)
Sidenote: I'm very intrigued by the decimal expansion, no idea why there's a 000005
Regarding distances, numpy
and scipy
should have a plethora of options, way more than what I'm familiar with, so you should find many good alternatives after doing a quick search on google :) I usually stick with pdist()
Upvotes: 2