Reputation: 548
I am iterating over a series of csv files as dataframes, eventually writing them all out to a common excel workbook.
In one of the many files, there are decimal GPS values (latitude, longitude) split into two columns (df[4]
and df[5]
) that I'm converting to degrees-minutes-seconds. That method returns a tuple that I'm attempting to park in two new fields called dmslat
and dmslon
in the same row of the original dataframe:
def convert_dd_to_dms(lat, lon):
# does the math here
return dmslat, dmslon
csv_dir = askdirectory() # tkinter directory picker
os.chdir(csv_dir)
for f in glob.iglob("*.csv"):
(csv_path, csv_name) = os.path.split(f)
(csv_prefix, csv_ext) = os.path.splitext(csv_name)
if csv_prefix[-3:] == "loc":
df = pd.read_csv(f)
df['dmslat'] = None
df['dmslon'] = None
for i, row in df.iterrows():
fixed_coords = convert_dd_to_dms(row[4], row[5])
row['dmslat'] = fixed_coords[0]
row['dmslon'] = fixed_coords[1]
print(df)
# process the other files
So when I use a print()
statement I can see the coords are properly calculated but they are not being committed to the dmslat
/dmslon
fields.
I have also tried assigning the new fields within the row iterator, but since I am at the row scale, it ends up overwriting the entire column with the new calculated value every time.
How can I get the results to (succinctly) populate the columns?
Upvotes: 0
Views: 47
Reputation: 12503
I think you better use apply
rather than iterrows
.
Here's a solution that is based on apply
. I replaced your location calculation with a function named 'foo' which does some arbitrary calculation from two fields 'a' and 'b' to new values for 'a' and 'b'.
df = pd.DataFrame({"a": range(10), "b":range(10, 20)})
def foo(row):
return (row["a"] + row["b"], row["a"] * row["b"])
new_df = df.apply(foo, axis=1).apply(pd.Series)
In the above code block, applying 'foo' returns a tuple for every row. Using apply
again with pd.Series
turns it into a data frame.
df[["a", "b"]] = new_df
df.head(3)
a b
0 10 0
1 23 132
2 38 336
Upvotes: 0
Reputation: 478
It would appear that df.iterrows() is resulting in a "copy" of each row, thus when you add/update the columns "dmslat" and "dmslon", you are modifying the copy, not the original dataframe. This can be confirmed by printing "row" after your assignments. You will see the row item was successfully updated, but the changes are not reflected in the original dataframe.
To modify the original dataframe, you can modify your code as such:
for i, row in df.iterrows():
fixed_coords = convert_dd_to_dms(row[4], row[5])
df.loc[i, 'dmslat'] = fixed_coords[0]
df.loc[i, 'dmslon'] = fixed_coords[1]
print(df)
using df.loc
guarantees the changes are made to the original dataframe.
Upvotes: 1