ChiAk
ChiAk

Reputation: 11

Creating a new field based on text matching and conditions from multiple fields. Python

I have a data frame like this, where I want to assigned a new category based on matching certain words in the "Review" field and certain "Product" types. I created two lists with different n-grams for each category and I need "My Category" selected based on those words in the lists matching to the words in the "Review" and certain product types (any of the product types selected). The code needs to assign multiple categories if needed.

Record ID Product Review My Category
123 Tablet Battery life sucks. Don't buy. Category 1
456 Laptop Love the sleek design, but battery life is bad. Category 2
789 Tablet I love it, even though it sucks sometimes. Category 1, Category 2

The code I have assigns single and multiple categories based on word matches from the lists, but I can't figure out why it's not taking Product types into consideration.

Category_1 = [" battery ", "sucks "]
Category_2 = [" battery ", " love ", " design ", "thin"]

df['My Category']= ''
for index, row in df.iterrows():
    data=df['Description'].iloc[index]
    
    check=["true" for word in Category_1 if(word in data)]
    if("true" in check) &  df['Product'].isin(['tablet']).any(): 
        df['My Category'].iloc[index] = df['My Category'].iloc[index] + ', ' + 'Category 1' 

    check=["true" for word in Category_2 if(word in data)]
    if("true" in check) &  df['Product'].isin(['tablet', 'laptop']).any():
        df['My Category'].iloc[index] = df['My Category'].iloc[index] + ', ' + 'Category 2' 

Basically Category 1 if Review has any of the words [" battery ", " sucks "] AND product is tablet. Category 2 if review has any of the words [" battery ", " love ", " design ", "thin"] AND product is either laptop or tablet. Categories aren't mutually exclusive.

The " & df['Product'].isin(['product types'])).any(): " part doesn't do anything, can anyone tell why or how to fix it?

Upvotes: 1

Views: 123

Answers (1)

Eric Truett
Eric Truett

Reputation: 3010

You should use apply for this task.

from io import StringIO
from io import StringIO

data = StringIO("""
Record ID   Product Review  
123 Tablet  Battery life sucks.
456 Laptop  Love the sleek design, but battery life is bad.
789 Tablet  I love it, even though it sucks sometimes.
""")

df = pd.read_csv(data, sep='\t')


def categorize(row):
    """Gets category from row
         Can access columns with dot notation, e.g., row.Product
    """
    # determine categories
    #return categories


df['categories'] = df.apply(categorize, axis=1)

Upvotes: 1

Related Questions