Chris M-B
Chris M-B

Reputation: 62

Appending value to a list based on dictionary key

I started writing Python scripts for my research this past summer, and have been picking up the language as I go. For my current work, I have a dictionary of lists, sample_range_dict, that is initialized with descriptor_cols as the keys and empty lists for values. Sample code is below:

import numpy as np
import pandas as pd

def rangeFunc(arr):
    return (np.max(arr) - np.min(arr))

df_sample = pd.DataFrame(np.random.rand(2000, 4), columns=list("ABCD")) #random dataframe for testing
col_list = df_sample.columns

sample_range_dict = dict.fromkeys(col_list, []) #creates dictionary where each key pairs with an empty list
rand_df = df_sample.sample(n=20) #make a new dataframe with 20 random rows of df_sample

I want to go through each column from rand_df and calculate the range of values, putting each range in the list with the specified column name (e.g. sample_range_dict["A"] = [range in column A]). The following is the code I initially thought to use for this:

for d in col_list:
    sample_range_dict[d].append(rangeFunc(rand_df[d].tolist()))

However, instead of each key having one item in the list, printing sample_range_dict shows each key having an identical list of 4 values:

{'A': [0.8404352070810013,
  0.9766398946246098,
  0.9364714925930782,
  0.9801082480908744],
 'B': [0.8404352070810013,
  0.9766398946246098,
  0.9364714925930782,
  0.9801082480908744],
 'C': [0.8404352070810013,
  0.9766398946246098,
  0.9364714925930782,
  0.9801082480908744],
 'D': [0.8404352070810013,
  0.9766398946246098,
  0.9364714925930782,
  0.9801082480908744]}

I've determined that the first value is the range for "A", second value is the range for "B", and so on. My question is about why this is happening, and how I could rewrite the code in order to get one item in the list for each key.

P.S. I'm looking to make this an iterative process, hence using lists instead of single numbers.

Upvotes: 2

Views: 154

Answers (1)

kaya3
kaya3

Reputation: 51037

The issue is this line:

sample_range_dict = dict.fromkeys(col_list, [])

You only created one list. You don't have four lists with the same elements; you have one list, and four references to it. When you add to it via one reference, the element is visible through the other references, because it's the same list:

>>> a = dict.fromkeys(['x', 'y', 'z'], [])
>>> a['x'] is a['y']
True
>>> a['x'].append(5)
>>> a['y']
[5]

If you want each key to have a different list, either create a new list for each key:

>>> a = { k: [] for k in ['x', 'y', 'z'] }
>>> a['x'] is a['y']
False
>>> a['x'].append(5)
>>> a['y']
[]

Or use a defaultdict which will do it for you:

>>> from collections import defaultdict
>>> a = defaultdict(list)
>>> a['x'] is a['y']
False
>>> a['x'].append(5)
>>> a['y']
[]

Upvotes: 2

Related Questions