malligator
malligator

Reputation: 129

dataframe overwritten when using list comprehension

I am attempting to create four new pandas dataframes via a list comprehension. Each new dataframe should be the original 'constituents_list' dataframe with two new columns. These two columns add a defined number of years to an existing column and return the value. The example code is below

def add_maturity(df, tenor):
    df['tenor'] = str(tenor) + 'Y'
    df['maturity'] = df['effectivedate'] + pd.DateOffset(years=tenor)
    return df

year_list = [3, 5, 7, 10]
new_dfs = [add_maturity(constituents_file, tenor) for tenor in year_list]

My expected output in in the new_dfs list should have four dataframes, each with a different value for 'tenor' and 'maturity'. In my results, all four dataframes have the same data with 'tenor' of '10Y' and a 'maturity' that is 10 years greater than the 'effectivedate' column.

I suspect that each time I iterate through the list comprehension each existing dataframe is overwritten with the latest call to the function. I just can't work out how to stop this happening.

Many thanks

Upvotes: 1

Views: 311

Answers (1)

filbranden
filbranden

Reputation: 8898

When you're assigning to the DataFrame object, you're modifying in place. And when you pass it as an argument to a function, what you're passing is a reference to the DataFrame object, in this case a reference to the same DataFrame object every time, so that's overwriting the previous results.

To solve this issue, you can either create a copy of the DataFrame at the start of the function:

def add_maturity(df, tenor):
    df = df.copy()
    df['tenor'] = str(tenor) + 'Y'
    df['maturity'] = df['effectivedate'] + pd.DateOffset(years=tenor)
    return df

(Or you could keep the function as is, and have the caller copy the DataFrame first when passing it as an argument...)

Or you can use the assign() method, which returns a new DataFrame with the modified columns:

def add_maturity(df, tenor):
    return df.assign(
        tenor= str(tenor) + 'Y',
        maturity=df['effectivedate'] + pd.DateOffset(years=tenor),
    )

(Personally, I'd go with the latter. It's similar to how most DataFrame methods work, in that they typically return a new DataFrame rather than modifying it in place.)

Upvotes: 1

Related Questions