Reputation: 637
I have a column in my DataFrame called "amenities" and this is what 1 records looks like:
print(df["amenities"][0])
{"Wireless Internet",Kitchen,"Free parking on premises",Breakfast,Heating,Washer,Dryer,"Smoke detector","Carbon monoxide detector",Essentials,Shampoo,Hangers,"Hair dryer","Laptop friendly workspace","translation missing: en.hosting_amenity_49","translation missing: en.hosting_amenity_50"}
What I'm trying to do is remove the special characters and then I want to separate them so that every amenity has its own column
Room Amenity1 Amenity2 Amenity3 Amenity4
1 Wireless Internet Kitchen Free Parking Breakfast
What I did is:
import re
df['amenities'] = df['amenities'].map(lambda x:re.sub('\W+',' ', x))
Wireless Internet Air conditioning Pool Kitchen Free parking on premises Gym Hot tub Indoor fireplace Heating Family kid friendly Suitable for events Washer Dryer Smoke detector Carbon monoxide detector First aid kit Fire extinguisher Essentials Shampoo Lock on bedroom door 24 hour check in Hangers Hair dryer Iron Laptop friendly workspace
.
This cleans the string but now I do not know how to separate them into its own columns because Wireless Internet should be onle column, not two.
Upvotes: 2
Views: 109
Reputation: 20117
In general, you want to use list comprehensions instead of map functions. They are more readable and often enough achieve the same thing. You could go about it like this:
sc_sub = re.compile('\W+')
df['amenities'] = [sc_sub.sub('', amenity) for amenity in df['amenities']]
Upvotes: 1