Mahmood
Mahmood

Reputation: 45

Keyword Detection in CSV and add to new column

I have real estate properties and their details (17 columns) in a CSV file (nearly half a million entries). One of the columns provides a location but is actually somewhat a bit too detailed. I want to categorize my entries so I want to simplify the location to give me more generic areas. I would have the areas I want to categorize the entries into in a list such as:

keywords = ['Downtown','Park View','Industrial District', ... ]

So ideally I would like to take an entry that has for example Sky Tower Downtown Los Angeles and then classify it as Downtown.

So the task is to first detect the keyword in the location column and then append it to a new column (right beside it if possible). If no keyword is found in the entry, I would to classify it as Other.

It would look something like this:

Date Record_Type Location Proterty_Type ... Price
19-Mar-21 Active Listing Sky Tower Downtown Los Angeles Apartment ... 15000
19-Mar-21 Active Listing Central Park Residential Tower, 5th Avenue Apartment ... 17000
20-Mar-21 Active Listing Meadow Gardens, Park View Villa ... 125000

To something like:

Date Record_Type Location Area Proterty_Type ... Price
19-Mar-21 Active Listing Sky Tower Downtown Los Angeles Downtown Apartment ... 15000
19-Mar-21 Active Listing Central Park Residential Tower, 5th Avenue Other Apartment ... 17000
20-Mar-21 Active Listing Meadow Gardens, Park View Park View Villa ... 125000

Finally it saves it all to a new csv file. I would also ideally like yo use pandas to read/write on the csv.

Thanks in advance!

Edit: I have tried methods such as the following threads, but I get errors and I don't know whats wrong, so Im open to fresh ideas.

How to append a new column to a CSV file using Python?

Adding new column to CSV in Python

Upvotes: 1

Views: 415

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195408

If you have this datafame:

        Date     Record_Type                                    Location Proterty_Type   Price
0  19-Mar-21  Active Listing              Sky Tower Downtown Los Angeles     Apartment   15000
1  19-Mar-21  Active Listing  Central Park Residential Tower, 5th Avenue     Apartment   17000
2  20-Mar-21  Active Listing                   Meadow Gardens, Park View         Villa  125000

Then:

keywords = ["Downtown", "Park View", "Industrial District"]

df.insert(
    loc=3,
    column="Area",
    value=df["Location"].apply(
        lambda x: next((kw for kw in keywords if kw in x), "Other")
    ),
)
print(df)

Creates Area column next to Location and prints:

        Date     Record_Type                                    Location       Area Proterty_Type   Price
0  19-Mar-21  Active Listing              Sky Tower Downtown Los Angeles   Downtown     Apartment   15000
1  19-Mar-21  Active Listing  Central Park Residential Tower, 5th Avenue      Other     Apartment   17000
2  20-Mar-21  Active Listing                   Meadow Gardens, Park View  Park View         Villa  125000

Upvotes: 1

Related Questions