Keperoze
Keperoze

Reputation: 73

Pandas appending dictionary values with iterrows row values

I have a dict of city names, each having an empty list as a value. I am trying to use df.iterrows() to append corresponding names to each dict key(city):

for index, row in df.iterrows():
    dict[row['city']].append(row['fullname'])

Can somebody explain why the code above appends all possible 'fullname' values to each dict's key instead of appending them to their respective city keys?

I.e. instead of getting the result

{"City1":["Name1","Name2"],"City2":["Name3","Name4"]}

I'm getting

{"City1":["Name1","Name2","Name3","Name4"],"City2":["Name1","Name2","Name3","Name4"]}

Edit: providing a sample of the dataframe:

d = {'fullname': ['Jason', 'Katty', 'Molly', 'Nicky'], 
'city': ['Arizona', 'Arizona', 'California', 'California']}
df = pd.DataFrame(data=d)

Edit 2: I'm pretty sure that my problem lies in my dict, since I created it in the following way:

cities = []
for i in df['city']:
    cities.append(i)
    
dict = dict.fromkeys(set(cities), [])

when I call dict, i get the correct output:

{"Arizona":[],"California":[]}

However if I specify a key dict['Arizona'], i get this:

{"index":[],"columns":[],"data":[]}

Upvotes: 0

Views: 866

Answers (2)

user15398259
user15398259

Reputation:

The problem is indeed .fromkeys - the default value is evaluated once - so all of the keys are "pointing to" the same list.

>>> dict.fromkeys(['one', 'two'], [])
{'one': [], 'two': []}
>>> d = dict.fromkeys(['one', 'two'], [])
>>> d['one'].append('three')
>>> d
{'one': ['three'], 'two': ['three']}

You'd need a comprehension to create a distinct list for each key.

>>> d = { k: [] for k in ['one', 'two'] }
>>> d
{'one': [], 'two': []}
>>> d['one'].append('three')
>>> d
{'one': ['three'], 'two': []}

You are also manually implementing a groupby with your code:

>>> df.groupby('city')['fullname'].agg(list)
city
Arizona       [Jason, Katty]
California    [Molly, Nicky]
Name: fullname, dtype: object

If you want a dict:

>>> df.groupby('city')['fullname'].agg(list).to_dict()
{'Arizona': ['Jason', 'Katty'], 'California': ['Molly', 'Nicky']}

Upvotes: 1

Alex Legaria
Alex Legaria

Reputation: 73

I'm surprised it works at all, because row is a Series.

How about this alternative approach:

for city in your_dict.keys():
    your_dict[city] += list(df["fullname"][df["city"] == city])

You should always avoid iterating through dataframes unless it's absolutely necessary.

Upvotes: 2

Related Questions