ropolo
ropolo

Reputation: 137

Using dictionary keys in pandas dataframe columns

I wrote the following code in which I create a dictionary of pandas dataframes:

import pandas as pd
import numpy as np

classification = pd.read_csv('classification.csv')

thresholdRange = np.arange(0, 70, 0.5).tolist()

classificationDict = {}

for t in thresholdRange:
    classificationDict[t] = classification

for k, v in classificationDict.iteritems():
    v ['Threshold'] = k

In this case, I want to create a column called 'Threshold' in all the pandas dataframes in which the keys of the dictionary are the values. However, what I get with the code above is the same value in all dataframes. What am I missing here? Perhaps I am complicating things for myself with this approach, but I'd greatly appreciate your help.

Upvotes: 0

Views: 851

Answers (2)

Ilja
Ilja

Reputation: 2114

Sorry, I got your question wrong. Now this is the issue:

Obviously, classification (a pandas dataframe, I suppose) is a mutable object, and adding a mutable object to a list or a dict makes strange (for python-beginners) behaviour. The same object is added. If you change one of the list entries, all get changed. Try this:

a = [1]
b = [a, a]
b[0] = 2
print(b[1])

This is what happens to your dict. You have to add different objects to the dict. Probably the dataframe has a .copy()-method to do this. Alternatively, I found this post for you, with (in essence) the same problem, there are further solutions there:
https://stackoverflow.com/a/2612815/6053327

Upvotes: 1

Ilja
Ilja

Reputation: 2114

Of course you get the same value. You are doing the same assignment over and over again in

for k, v in classificationDict.iteritems():

because your vs are all identical, you assigned them in the first for
Did you try debugging yourself, and print classification? I assume that it is only the first line?

Upvotes: 0

Related Questions