Reputation: 1657
so I have this data frame with about 5 columns. 2 of them are longitude and lattitude pairs in form of tuples. so I have another user defined function that calculates the distance between two given tuples of lon/lat.
data_all['gc_distance'] = ""
### let's start calculate the great circle distance
for idx, row in data_all.iterrows():
row['gc_distance'] = gcd.dist(row['ping_location'], row['destination'])
print(row)
so basically, i created an empty column named gc_distance, then i iterate through each row to calculate the distance. when i print each row, the data looks great;
sample print of a row:
created_at_des 2018-01-17 18:55:55.154000
location_missing 0
ping_location (-121.9419444444, 37.4897222222)
destination (-122.15057, 37.39465)
gc_distance 23.85 km
Name: 393529, dtype: object
as you can see, the gc_distance DOES have value.
Here's the sample output from the print statement after the loop:
location_missing ping_location \
0 (-152.859052, 51.218273)
0 (120.585289, 31.298974)
0 (120.585289, 31.298974)
0 (120.585289, 31.298974)
0 (121.4737021, 31.2303904)
destination gc_distance
0 (-122.057005, 37.606922)
1 (-122.057005, 37.606922)
2 (-122.057005, 37.606922)
3 (-122.057005, 37.606922)
4 (-122.057005, 37.606922)
However, when I print it again outside of the for loop, gc_distance column has only blank vlaues! :(
Why is this??? There's no compile or run time error... And all other outputs look good, why is this calculated field not there, even though when I print it during the for loop it does have value? (but outside for loop it doesn't anymore)
Upvotes: 0
Views: 36
Reputation: 575
Try this method out:
import pandas as pd
import numpy as np
import math
def dist(i):
diff = list(map(lambda a,b: a-b, df['a'][i], df['b'][i]))
squared = [(k)**2 for k in diff]
squared_diff = sum(squared)
root = math.sqrt(squared_diff)
return root
df = pd.DataFrame([[0, 0, 5, 6, '', '', ''], [2, 6, -5, 8, '', '', '']], columns = ["x_a", "y_a", "x_b", "y_b", "a", "b", "dist"])
print(df)
#data_all['ping_location'] = list(zip(data_all.longitude_evnt, data_all.lattitude_evnt))
df['a'] = list(zip(df.x_a, df.y_a))
df['b'] = list(zip(df.x_b, df.y_b))
print(df)
for i in range(0, len(df)):
df['dist'][i] = dist(i)
print(dist(i))
print(df)
This is my terminal output:
x_a y_a x_b y_b a b dist
0 0 0 5 6
1 2 6 -5 8
x_a y_a x_b y_b a b dist
0 0 0 5 6 (0, 0) (5, 6)
1 2 6 -5 8 (2, 6) (-5, 8)
test.py:24: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
df['dist'][i] = dist(i)
7.810249675906654
7.280109889280518
x_a y_a x_b y_b a b dist
0 0 0 5 6 (0, 0) (5, 6) 7.81025
1 2 6 -5 8 (2, 6) (-5, 8) 7.28011
Upvotes: 1