Curious
Curious

Reputation: 335

Creating a new column in pandas based on a list of values inside a dictionary

Say I have a dataframe which contains the locations of places.

df1 = pd.DataFrame({'col1': [1,2,3,4,5], 'location': ['Hackney', 'Mile End', 'Croydon', 'Edgbaston', 'Wembley'] })

Then I have a list of these places and what the main city they are contained in stored in a dictionary

dict ={
['Hackney', 'Mile End', 'Croydon', 'Wembley'] : 'London',
['Edgbaston'] : 'Birmingham'
}

Question: How could I create a new column (say df1['city']) which uses this dictionary to populate which city each of the location column entries is in. Note: If creating a dictionary to do this isnt the ideal way feel free to suggest an alternative.

Ideal Output: Would like something as shown below (this should generalise for more entries providing the dictionary is extended if need be).

df1 = pd.DataFrame({'col1': [1,2,3,4,5], 'location': ['Hackney', 'Mile End', 'Croydon', 'Edgbaston', 'Wembley'], 'city': ['London','London','London','Birmingham','London'] })

Tried: Using the apply method but seems to give an error

df1['city'] = df1['location'].apply(dict)

Upvotes: 1

Views: 137

Answers (2)

jezrael
jezrael

Reputation: 862641

Your dictionary is not valid, you can use list for values of dictionary, also not call dictioanry like dict, because python code name, builtins:

d = { 'London': ['Hackney', 'Mile End', 'Croydon'],
     'Birmingham': ['Edgbaston']}

Here is possible flatten values in lists and then use Series.map, if not exist value is returned missing value:

d1 = {x: k for k, v in d.items() for x in v}
print (d1)
{'Hackney': 'London', 'Mile End': 'London', 'Croydon': 'London', 'Edgbaston': 'Birmingham'}

df1['city'] = df1['location'].map(d1)
print (df1)
   col1   location        city
0     1    Hackney      London
1     2   Mile End      London
2     3    Croydon      London
3     4  Edgbaston  Birmingham
4     5    Wembley         NaN

If dictionary format is tuples in keys:

d ={('Hackney', 'Mile End', 'Croydon') : 'London', ('Edgbaston', ) : 'Birmingham'}


d1 = {x: v for k, v in d.items() for x in k}
print (d1)
{'Hackney': 'London', 'Mile End': 'London', 'Croydon': 'London', 'Edgbaston': 'Birmingham'}

df1['city'] = df1['location'].map(d1)
print (df1)
   col1   location        city
0     1    Hackney      London
1     2   Mile End      London
2     3    Croydon      London
3     4  Edgbaston  Birmingham
4     5    Wembley         NaN

Upvotes: 1

Mortz
Mortz

Reputation: 4879

You cannot have a python dict with mutable keys - which means you probably need a tuple instead of a list

dict ={
('Hackney', 'Mile End', 'Croydon') : 'London',
('Edgbaston', ) : 'Birmingham'
}

Once you have this - you can use the map function to map a location to a city. If your dict did not have tuples for keys, you could have used it directly, but in this case - you can define a function -

def get_city(location):
    for key in dict.keys():
        if location in key:
            return dict[key]

df1['location'].map(get_city)
#0        London
#1        London
#2        London
#3    Birmingham
#4          None

Upvotes: 1

Related Questions