Create two new fields at once in pandas dataframe based off of calculations of other fields

Question

I am iterating over a series of csv files as dataframes, eventually writing them all out to a common excel workbook.

In one of the many files, there are decimal GPS values (latitude, longitude) split into two columns (df[4] and df[5]) that I'm converting to degrees-minutes-seconds. That method returns a tuple that I'm attempting to park in two new fields called dmslat and dmslon in the same row of the original dataframe:

def convert_dd_to_dms(lat, lon):
    # does the math here
    return dmslat, dmslon

csv_dir = askdirectory()  # tkinter directory picker
os.chdir(csv_dir)
for f in glob.iglob("*.csv"):
    (csv_path, csv_name) = os.path.split(f)
    (csv_prefix, csv_ext) = os.path.splitext(csv_name)
    if csv_prefix[-3:] == "loc":
        df = pd.read_csv(f)
        df['dmslat'] = None
        df['dmslon'] = None
        for i, row in df.iterrows():
            fixed_coords = convert_dd_to_dms(row[4], row[5])
            row['dmslat'] = fixed_coords[0]
            row['dmslon'] = fixed_coords[1]
        print(df)
# process the other files

So when I use a print() statement I can see the coords are properly calculated but they are not being committed to the dmslat/dmslon fields.

I have also tried assigning the new fields within the row iterator, but since I am at the row scale, it ends up overwriting the entire column with the new calculated value every time.

How can I get the results to (succinctly) populate the columns?

Rexovas · Accepted Answer

It would appear that df.iterrows() is resulting in a "copy" of each row, thus when you add/update the columns "dmslat" and "dmslon", you are modifying the copy, not the original dataframe. This can be confirmed by printing "row" after your assignments. You will see the row item was successfully updated, but the changes are not reflected in the original dataframe.

To modify the original dataframe, you can modify your code as such:

        for i, row in df.iterrows():
            fixed_coords = convert_dd_to_dms(row[4], row[5])
            df.loc[i, 'dmslat'] = fixed_coords[0]
            df.loc[i, 'dmslon'] = fixed_coords[1]
        print(df)

using df.loc guarantees the changes are made to the original dataframe.

Create two new fields at once in pandas dataframe based off of calculations of other fields

Answers (2)

Related Questions