Reputation: 173
I am working with a dataframe within a loop. Within each iteration, there are operations performed on the dataframe variables. At the end of each iteration, I need to store this dataframe into a dictionary, with the index that is related to the iteration index.
For example:
df = pd.DataFrame(index=range(20))
dict = {}
for k in range(5):
df['iter'] = k
dict[k] = df
My expected result of 'dict' would be a dictionary with 5 dataframes. Say for key value '1', I should have a dataframe 'df' with a column 'iter' that has all values as 1. Similarly, for key value '2', I should have a 'df' with all values 2.
However, I find that all the key values have the same dataframe stored in them. All of them have the value 4 in the dataframe.
I tried running the operations step-by-step, instead of looping. What I found is that, initially the correct dataframe is stored. But in the next iteration step, when performing
df['iter'] = k
the value within the dictionary is also getting updated.
What is the way to get around this problem? My actual dataframe is much bigger and have many more operations, that need to be performed within the loop.
Upvotes: 1
Views: 2468
Reputation: 882
You need to do a copy of the data frame. (dict
is a terrible name, don't use keywords as variable names. If you do need to use them, follow them by an underscore.)
df = pd.DataFrame(index=range(20))
dict_ = {}
for k in range(5):
df['iter'] = k
dict_[k] = df.copy()
Upvotes: 1
Reputation: 49838
Each entry into dict
(terrible name, BTW, as it is already the name of the type) needs to be a copy of df
.
Upvotes: 2