manuel quiros
manuel quiros

Reputation: 13

How to pass two dataframe columns to scipy.spatial.distance.pdist

I have a dataframe with a name of a location in the index and 2 columns, Latitude and Longitude.

data = {'LE0039': {'LATITUDE': 59.522583, 'LONGITUDE': 29.566056},
        'LE0073': {'LATITUDE': 59.287991, 'LONGITUDE': 31.369472},
        'LE0142': {'LATITUDE': 59.350241, 'LONGITUDE': 32.531339},
        'LE0278': {'LATITUDE': 59.96475, 'LONGITUDE': 29.19585}}
df = pd.DataFrame.from_dict(data, 'index')

         LATITUDE  LONGITUDE
LE0039  59.522583  29.566056
LE0073  59.287991  31.369472
LE0142  59.350241  32.531339
LE0278  59.964750  29.195850

I need to calculate the minimum distance from one site to any other and store it in a third column for each site. I want to calculate the distance matrix with scipy.spatial.distance.pdist() but in order to do it I first need a new column with (LATITUDE, LONGITUDE) in order to pass it to pdist().

So I have 2 questions. One is how to combine lat and long to have an array of (lat,long) and the other if you think there is a better way to calculate the minimum distance

Upvotes: 1

Views: 2374

Answers (1)

Yuca
Yuca

Reputation: 6091

use the good old combo of list + zip. zip creates the paired object and list creates the list so it can be assigned to the dataframe

df['combined'] = list(zip(df.LATITUDE, df.LONGITUDE))

output:

LE0039  59.522583   29.566056   (59.522583, 29.566056)
LE0073  59.287991   31.369472   (59.287991000000005, 31.369472)
LE0142  59.350241   32.531339   (59.350241000000004, 32.531339)
LE0278  59.964750   29.195850   (59.96475, 29.19585)

Sidenote: I'm very intrigued by the decimal expansion, no idea why there's a 000005

Regarding distances, numpy and scipy should have a plethora of options, way more than what I'm familiar with, so you should find many good alternatives after doing a quick search on google :) I usually stick with pdist()

Upvotes: 2

Related Questions