Modifying and storing a dataframe in a dictionary within a loop

Question

I am working with a dataframe within a loop. Within each iteration, there are operations performed on the dataframe variables. At the end of each iteration, I need to store this dataframe into a dictionary, with the index that is related to the iteration index.

For example:

df = pd.DataFrame(index=range(20))
dict = {}
for k in range(5):
    df['iter'] = k
    dict[k] = df

My expected result of 'dict' would be a dictionary with 5 dataframes. Say for key value '1', I should have a dataframe 'df' with a column 'iter' that has all values as 1. Similarly, for key value '2', I should have a 'df' with all values 2.

However, I find that all the key values have the same dataframe stored in them. All of them have the value 4 in the dataframe.

I tried running the operations step-by-step, instead of looping. What I found is that, initially the correct dataframe is stored. But in the next iteration step, when performing

df['iter'] = k

the value within the dictionary is also getting updated.

What is the way to get around this problem? My actual dataframe is much bigger and have many more operations, that need to be performed within the loop.

Slayer · Accepted Answer

You need to do a copy of the data frame. (dict is a terrible name, don't use keywords as variable names. If you do need to use them, follow them by an underscore.)

df = pd.DataFrame(index=range(20))
dict_ = {}
for k in range(5):
    df['iter'] = k
    dict_[k] = df.copy()

Modifying and storing a dataframe in a dictionary within a loop

Answers (2)

Related Questions