Juan C
Juan C

Reputation: 6132

Create categorical column based on string values

I have kind of a simple problem, but I'm having trouble achieving what I want. I have a district column, with 32 different values for all districts in a city. I want to create a column "sector" that says which sector that district belongs to. I thought the obvious approach was through a dictionary and map, but couldn't make it work:

sectores={'sector oriente':['Vitacura, Las Condes, Lo Barnechea', 'La Reina','Ñuñoa','Providencia'],
     'sector suroriente':['Peñalolén','La Florida', 'Macul'],
     'sector sur': ['La Granja','La Pintana','Lo Espejo','San Ramón','La Cisterna','El Bosque','Pedro Aguirre Cerda','San Joaquín','San Miguel'],
     'sector surponiente':['Maipú','Estación Central','Cerrillos'],
     'sector norponiente':['Cerro Navia','Lo Prado','Pudahuel','Quinta Normal','Renca'],
     'sector norte':['Conchalí','Huechuraba','Independencia','Recoleta','Quilicura'],
     'sector centro':['Santiago']}

Noticed I needed to switch keys and values:

sectores = dict((y,x) for x,y in sectores.items())

Then tried to map it:

df['sectores']=df['district'].map(sectores)

But I'm getting:

TypeError: unhashable type: 'list'

Is this the right approach? Should I try something else? Thanks in advance!

Edit: This is what df['district'] looks like:

district

Maipú
Quilicura
Independencia
Conchalí
...

Upvotes: 1

Views: 169

Answers (1)

rafaelc
rafaelc

Reputation: 59274

You are trying to use lists as the keys in your dict, which is not possible because lists are mutable and not hashable.

Instead, use the strings by iterating through the values:

sectores = {i: k for k, v in sectores.items() for i in v}

Then, you can use pd.Series.map and

df['sectores']=df['district'].map(sectores)

should work

Upvotes: 1

Related Questions