toceto
toceto

Reputation: 637

Remove all non-letter characters and separate into new columns

I have a column in my DataFrame called "amenities" and this is what 1 records looks like:

print(df["amenities"][0])

{"Wireless Internet",Kitchen,"Free parking on premises",Breakfast,Heating,Washer,Dryer,"Smoke detector","Carbon monoxide detector",Essentials,Shampoo,Hangers,"Hair dryer","Laptop friendly workspace","translation missing: en.hosting_amenity_49","translation missing: en.hosting_amenity_50"}

What I'm trying to do is remove the special characters and then I want to separate them so that every amenity has its own column

Room       Amenity1     Amenity2    Amenity3    Amenity4

  1 Wireless Internet   Kitchen   Free Parking  Breakfast

What I did is:

import re

df['amenities'] = df['amenities'].map(lambda x:re.sub('\W+',' ', x))
Wireless Internet Air conditioning Pool Kitchen Free parking on premises Gym Hot tub Indoor fireplace Heating Family kid friendly Suitable for events Washer Dryer Smoke detector Carbon monoxide detector First aid kit Fire extinguisher Essentials Shampoo Lock on bedroom door 24 hour check in Hangers Hair dryer Iron Laptop friendly workspace
.

This cleans the string but now I do not know how to separate them into its own columns because Wireless Internet should be onle column, not two.

Upvotes: 2

Views: 109

Answers (1)

Arne
Arne

Reputation: 20117

In general, you want to use list comprehensions instead of map functions. They are more readable and often enough achieve the same thing. You could go about it like this:

sc_sub = re.compile('\W+')
df['amenities'] = [sc_sub.sub('', amenity) for amenity in df['amenities']]

Upvotes: 1

Related Questions