joseph praful
joseph praful

Reputation: 173

Modifying and storing a dataframe in a dictionary within a loop

I am working with a dataframe within a loop. Within each iteration, there are operations performed on the dataframe variables. At the end of each iteration, I need to store this dataframe into a dictionary, with the index that is related to the iteration index.

For example:

df = pd.DataFrame(index=range(20))
dict = {}
for k in range(5):
    df['iter'] = k
    dict[k] = df

My expected result of 'dict' would be a dictionary with 5 dataframes. Say for key value '1', I should have a dataframe 'df' with a column 'iter' that has all values as 1. Similarly, for key value '2', I should have a 'df' with all values 2.

However, I find that all the key values have the same dataframe stored in them. All of them have the value 4 in the dataframe.

I tried running the operations step-by-step, instead of looping. What I found is that, initially the correct dataframe is stored. But in the next iteration step, when performing

df['iter'] = k

the value within the dictionary is also getting updated.

What is the way to get around this problem? My actual dataframe is much bigger and have many more operations, that need to be performed within the loop.

Upvotes: 1

Views: 2468

Answers (2)

Slayer
Slayer

Reputation: 882

You need to do a copy of the data frame. (dict is a terrible name, don't use keywords as variable names. If you do need to use them, follow them by an underscore.)

df = pd.DataFrame(index=range(20))
dict_ = {}
for k in range(5):
    df['iter'] = k
    dict_[k] = df.copy()

Upvotes: 1

Scott Hunter
Scott Hunter

Reputation: 49838

Each entry into dict (terrible name, BTW, as it is already the name of the type) needs to be a copy of df.

Upvotes: 2

Related Questions