Reputation: 335
Say I have a dataframe which contains the locations of places.
df1 = pd.DataFrame({'col1': [1,2,3,4,5], 'location': ['Hackney', 'Mile End', 'Croydon', 'Edgbaston', 'Wembley'] })
Then I have a list of these places and what the main city they are contained in stored in a dictionary
dict ={
['Hackney', 'Mile End', 'Croydon', 'Wembley'] : 'London',
['Edgbaston'] : 'Birmingham'
}
Question: How could I create a new column (say df1['city']
) which uses this dictionary to populate which city each of the location
column entries is in. Note: If creating a dictionary to do this isnt the ideal way feel free to suggest an alternative.
Ideal Output: Would like something as shown below (this should generalise for more entries providing the dictionary is extended if need be).
df1 = pd.DataFrame({'col1': [1,2,3,4,5], 'location': ['Hackney', 'Mile End', 'Croydon', 'Edgbaston', 'Wembley'], 'city': ['London','London','London','Birmingham','London'] })
Tried: Using the apply method but seems to give an error
df1['city'] = df1['location'].apply(dict)
Upvotes: 1
Views: 137
Reputation: 862641
Your dictionary is not valid, you can use list for values of dictionary, also not call dictioanry like dict
, because python code name, builtins:
d = { 'London': ['Hackney', 'Mile End', 'Croydon'],
'Birmingham': ['Edgbaston']}
Here is possible flatten values in lists and then use Series.map
, if not exist value is returned missing value:
d1 = {x: k for k, v in d.items() for x in v}
print (d1)
{'Hackney': 'London', 'Mile End': 'London', 'Croydon': 'London', 'Edgbaston': 'Birmingham'}
df1['city'] = df1['location'].map(d1)
print (df1)
col1 location city
0 1 Hackney London
1 2 Mile End London
2 3 Croydon London
3 4 Edgbaston Birmingham
4 5 Wembley NaN
If dictionary format is tuples in keys:
d ={('Hackney', 'Mile End', 'Croydon') : 'London', ('Edgbaston', ) : 'Birmingham'}
d1 = {x: v for k, v in d.items() for x in k}
print (d1)
{'Hackney': 'London', 'Mile End': 'London', 'Croydon': 'London', 'Edgbaston': 'Birmingham'}
df1['city'] = df1['location'].map(d1)
print (df1)
col1 location city
0 1 Hackney London
1 2 Mile End London
2 3 Croydon London
3 4 Edgbaston Birmingham
4 5 Wembley NaN
Upvotes: 1
Reputation: 4879
You cannot have a python dict
with mutable keys - which means you probably need a tuple instead of a list
dict ={
('Hackney', 'Mile End', 'Croydon') : 'London',
('Edgbaston', ) : 'Birmingham'
}
Once you have this - you can use the map
function to map a location to a city. If your dict did not have tuples for keys, you could have used it directly, but in this case - you can define a function -
def get_city(location):
for key in dict.keys():
if location in key:
return dict[key]
df1['location'].map(get_city)
#0 London
#1 London
#2 London
#3 Birmingham
#4 None
Upvotes: 1