Reputation: 193
I have this code that currently runs a for loops against one list
data3 = []
x=0
while x<len(river_df_list):
for line in river_df_list[x]:
try:
distance = haversine(river_df_list[x][0],river_df_list[x][1],df1_list[0][4],df1_list[0][3])
data3.append(distance)
x=x+1
except IndexError:
pass
df1_list[0].append(data3.index(min(data3)))
Where the haversine function is:
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 6371 # Radius of earth in kilometers. Use 3956 for miles
return c * r
river_df_list (shortened) looks like:
[[151.7753278, -32.90526725, 'HUNTER RIVER']
[151.77526830000002, -32.90610052, 'HUNTER RIVER']
[151.775397, -32.90977754, 'HUNTER RIVER']
[151.775578, -32.91202941, 'HUNTER RIVER']
[151.77586340000002, -32.91508789, 'HUNTER RIVER']
[151.7764116, -32.91645856, 'HUNTER RIVER']
[151.7773432, -32.91905274, 'HUNTER RIVER']
[151.7784225, -32.91996844, 'HUNTER RIVER']
[151.780565, -32.92181352, 'HUNTER RIVER']
[151.7807739, -32.92183623, 'HUNTER RIVER']
[151.78591709999998, -32.92187872, 'HUNTER RIVER']]
df1_list (shortened) looks like:
[[5, 'A69-1601-27466', 'Golden perch', -35.495479100000004, 144.45295380000002, '14/08/2015']
[6, 'A69-1601-27466', 'Golden perch', -35.495479100000004, 144.45295380000002, '15/08/2015']
[7, 'A69-1601-27466', 'Golden perch', -35.495479100000004, 144.45295380000002, '16/08/2015']
[8, 'A69-1601-27466', 'Golden perch', -35.5065473, 144.4488804, '17/08/2015']]
Currently, when I run the code at the top, I am able to loop through river_df_list and apply haversine function for the first point in df1_list. At the end, the code appends index where the minimum value occurred in data3 to the df1_list so it now looks like:
[5, 'A69-1601-27466', 'Golden perch', -35.495479100000004, 144.45295380000002, '14/08/2015',324110 ]
[6, 'A69-1601-27466', 'Golden perch', -35.495479100000004, 144.45295380000002, '15/08/2015']
[7, 'A69-1601-27466', 'Golden perch', -35.495479100000004, 144.45295380000002, '16/08/2015']
[8, 'A69-1601-27466', 'Golden perch', -35.5065473, 144.4488804, '17/08/2015']
What I want to be able to do is change the while / for loop at the top to compare all points of river_df_list across each and every point of df1_list and append the index to the end of df1_list so in the end, the desired output would be:
[[5, 'A69-1601-27466', 'Golden perch', -35.495479100000004, 144.45295380000002, '14/08/2015',324110 ]
[6, 'A69-1601-27466', 'Golden perch', -35.495479100000004, 144.45295380000002, '15/08/2015',32440]
[7, 'A69-1601-27466', 'Golden perch', -35.495479100000004, 144.45295380000002, '16/08/2015',31110]
[8, 'A69-1601-27466', 'Golden perch', -35.5065473, 144.4488804, '17/08/2015',35479]]
How would I go about doing this?
Upvotes: 2
Views: 60
Reputation: 508
This should work:
for x in df1_list:
data3 = []
for y in river_df_list:
distance = haversine(y[0],y[1],x[4],x[3])
data3.append(distance)
x.append(data3.index(min(data3)))
Because you need every point to relate to every other point, you use a nested loop and work through both. For each array in df1, you're running through all of river_df, getting the haversines and saving it into data3. Then you're getting the minimum from data3 and appending it onto that array before moving onto the next array in df1. It's working on the toy data you gave.
Edit: Also, data3 seems pretty expensive (both in time and memory) and unnecessary given you only really want the index of the minimum. This would eliminate it:
from sys import maxsize
for x in df1_list:
min_distance = [maxsize, 0]
for i, y in enumerate(river_df_list):
distance = haversine(y[0],y[1],x[4],x[3])
if distance < min_distance[0]:
min_distance = [distance, i]
x.append(min_distance[1])
I'm using maxsize because I don't know how big these distances get. If they're never going to be bigger than 1000000, you could just use that instead.
Upvotes: 1