Reputation: 45
I have real estate properties and their details (17 columns) in a CSV file (nearly half a million entries). One of the columns provides a location but is actually somewhat a bit too detailed. I want to categorize my entries so I want to simplify the location to give me more generic areas. I would have the areas I want to categorize the entries into in a list such as:
keywords = ['Downtown','Park View','Industrial District', ... ]
So ideally I would like to take an entry that has for example Sky Tower Downtown Los Angeles
and then classify it as Downtown
.
So the task is to first detect the keyword in the location
column and then append it to a new column (right beside it if possible). If no keyword is found in the entry, I would to classify it as Other
.
It would look something like this:
Date | Record_Type | Location | Proterty_Type | ... | Price |
---|---|---|---|---|---|
19-Mar-21 | Active Listing | Sky Tower Downtown Los Angeles | Apartment | ... | 15000 |
19-Mar-21 | Active Listing | Central Park Residential Tower, 5th Avenue | Apartment | ... | 17000 |
20-Mar-21 | Active Listing | Meadow Gardens, Park View | Villa | ... | 125000 |
To something like:
Date | Record_Type | Location | Area | Proterty_Type | ... | Price |
---|---|---|---|---|---|---|
19-Mar-21 | Active Listing | Sky Tower Downtown Los Angeles | Downtown | Apartment | ... | 15000 |
19-Mar-21 | Active Listing | Central Park Residential Tower, 5th Avenue | Other | Apartment | ... | 17000 |
20-Mar-21 | Active Listing | Meadow Gardens, Park View | Park View | Villa | ... | 125000 |
Finally it saves it all to a new csv file. I would also ideally like yo use pandas
to read/write on the csv.
Thanks in advance!
Edit: I have tried methods such as the following threads, but I get errors and I don't know whats wrong, so Im open to fresh ideas.
How to append a new column to a CSV file using Python?
Adding new column to CSV in Python
Upvotes: 1
Views: 415
Reputation: 195408
If you have this datafame:
Date Record_Type Location Proterty_Type Price
0 19-Mar-21 Active Listing Sky Tower Downtown Los Angeles Apartment 15000
1 19-Mar-21 Active Listing Central Park Residential Tower, 5th Avenue Apartment 17000
2 20-Mar-21 Active Listing Meadow Gardens, Park View Villa 125000
Then:
keywords = ["Downtown", "Park View", "Industrial District"]
df.insert(
loc=3,
column="Area",
value=df["Location"].apply(
lambda x: next((kw for kw in keywords if kw in x), "Other")
),
)
print(df)
Creates Area
column next to Location
and prints:
Date Record_Type Location Area Proterty_Type Price
0 19-Mar-21 Active Listing Sky Tower Downtown Los Angeles Downtown Apartment 15000
1 19-Mar-21 Active Listing Central Park Residential Tower, 5th Avenue Other Apartment 17000
2 20-Mar-21 Active Listing Meadow Gardens, Park View Park View Villa 125000
Upvotes: 1