alwaysaskingquestions
alwaysaskingquestions

Reputation: 1657

data populated in the dataframe during for loop but not there anymore after for loop

so I have this data frame with about 5 columns. 2 of them are longitude and lattitude pairs in form of tuples. so I have another user defined function that calculates the distance between two given tuples of lon/lat.

data_all['gc_distance'] = ""

### let's start calculate the great circle distance
for idx, row in data_all.iterrows():
    row['gc_distance'] = gcd.dist(row['ping_location'], row['destination'])
    print(row)

so basically, i created an empty column named gc_distance, then i iterate through each row to calculate the distance. when i print each row, the data looks great;

sample print of a row:

created_at_des                                     2018-01-17 18:55:55.154000
location_missing                                                            0
ping_location                                (-121.9419444444, 37.4897222222)
destination                                            (-122.15057, 37.39465)
gc_distance                                                          23.85 km
Name: 393529, dtype: object

as you can see, the gc_distance DOES have value.

Here's the sample output from the print statement after the loop:

 location_missing              ping_location  \
                 0   (-152.859052, 51.218273)   
                 0    (120.585289, 31.298974)   
                 0    (120.585289, 31.298974)   
                 0    (120.585289, 31.298974)   
                 0  (121.4737021, 31.2303904)   

                    destination gc_distance  
    0  (-122.057005, 37.606922)              
    1  (-122.057005, 37.606922)              
    2  (-122.057005, 37.606922)              
    3  (-122.057005, 37.606922)              
    4  (-122.057005, 37.606922) 

However, when I print it again outside of the for loop, gc_distance column has only blank vlaues! :(

Why is this??? There's no compile or run time error... And all other outputs look good, why is this calculated field not there, even though when I print it during the for loop it does have value? (but outside for loop it doesn't anymore)

Upvotes: 0

Views: 36

Answers (1)

Julian Rachman
Julian Rachman

Reputation: 575

Try this method out:

import pandas as pd
import numpy as np
import math

def dist(i):
    diff = list(map(lambda a,b: a-b, df['a'][i], df['b'][i]))
    squared = [(k)**2 for k in diff]
    squared_diff = sum(squared)
    root = math.sqrt(squared_diff)
    return root



df = pd.DataFrame([[0, 0, 5, 6, '', '', ''], [2, 6, -5, 8, '', '', '']], columns = ["x_a", "y_a", "x_b", "y_b", "a", "b", "dist"])
print(df)

#data_all['ping_location'] = list(zip(data_all.longitude_evnt, data_all.lattitude_evnt))

df['a'] = list(zip(df.x_a, df.y_a))     
df['b'] = list(zip(df.x_b, df.y_b)) 
print(df)

for i in range(0, len(df)):
    df['dist'][i] = dist(i)
    print(dist(i))

print(df)

This is my terminal output:

   x_a  y_a  x_b  y_b a b dist
0    0    0    5    6         
1    2    6   -5    8         
   x_a  y_a  x_b  y_b       a        b dist
0    0    0    5    6  (0, 0)   (5, 6)     
1    2    6   -5    8  (2, 6)  (-5, 8)     
test.py:24: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  df['dist'][i] = dist(i)
7.810249675906654
7.280109889280518
   x_a  y_a  x_b  y_b       a        b     dist
0    0    0    5    6  (0, 0)   (5, 6)  7.81025
1    2    6   -5    8  (2, 6)  (-5, 8)  7.28011

Upvotes: 1

Related Questions