Reputation: 73
I have a dict of city names, each having an empty list as a value. I am trying to use
df.iterrows()
to append corresponding names to each dict key(city):
for index, row in df.iterrows():
dict[row['city']].append(row['fullname'])
Can somebody explain why the code above appends all possible 'fullname' values to each dict's key instead of appending them to their respective city keys?
I.e. instead of getting the result
{"City1":["Name1","Name2"],"City2":["Name3","Name4"]}
I'm getting
{"City1":["Name1","Name2","Name3","Name4"],"City2":["Name1","Name2","Name3","Name4"]}
Edit: providing a sample of the dataframe:
d = {'fullname': ['Jason', 'Katty', 'Molly', 'Nicky'],
'city': ['Arizona', 'Arizona', 'California', 'California']}
df = pd.DataFrame(data=d)
Edit 2: I'm pretty sure that my problem lies in my dict, since I created it in the following way:
cities = []
for i in df['city']:
cities.append(i)
dict = dict.fromkeys(set(cities), [])
when I call dict, i get the correct output:
{"Arizona":[],"California":[]}
However if I specify a key dict['Arizona']
, i get this:
{"index":[],"columns":[],"data":[]}
Upvotes: 0
Views: 866
Reputation:
The problem is indeed .fromkeys
- the default value is evaluated once - so all of the keys are "pointing to" the same list.
>>> dict.fromkeys(['one', 'two'], [])
{'one': [], 'two': []}
>>> d = dict.fromkeys(['one', 'two'], [])
>>> d['one'].append('three')
>>> d
{'one': ['three'], 'two': ['three']}
You'd need a comprehension to create a distinct list for each key.
>>> d = { k: [] for k in ['one', 'two'] }
>>> d
{'one': [], 'two': []}
>>> d['one'].append('three')
>>> d
{'one': ['three'], 'two': []}
You are also manually implementing a groupby with your code:
>>> df.groupby('city')['fullname'].agg(list)
city
Arizona [Jason, Katty]
California [Molly, Nicky]
Name: fullname, dtype: object
If you want a dict:
>>> df.groupby('city')['fullname'].agg(list).to_dict()
{'Arizona': ['Jason', 'Katty'], 'California': ['Molly', 'Nicky']}
Upvotes: 1
Reputation: 73
I'm surprised it works at all, because row
is a Series.
How about this alternative approach:
for city in your_dict.keys():
your_dict[city] += list(df["fullname"][df["city"] == city])
You should always avoid iterating through dataframes unless it's absolutely necessary.
Upvotes: 2