Devarshi Goswami
Devarshi Goswami

Reputation: 1225

create new category column using pandas string contains and loops

I have a data frame that looks like this:

   name             Title
    abc          'Tech support'
    xyz          'UX designer'
    ghj          'Manager IT'
     ...            ....

I want to iterate through the data frame and using df.str.contains make another column that will categorize those jobs. There are 8 categories.

The output will be :

name              Title             category
abc           'Tech support'         'Support' 
xyz           'UX designer'          'Design'
ghj           'Manager IT'           'Management'
...              ....              ....

here's what I've tried so far:

for i in range(len(df)):
    if   df.Title[i].str.contains("Support"):
            df.category[i]=="Support"
    elif df.Title[i].str.contains("designer"):
            df.category[i]=="Design"
    else df.Title[i].str.contains("Manager"):
            df.category[i]=="Management"

of course , I'm a noob at programming and this throws the error:

  File "<ipython-input-29-d9457f9cb172>", line 6
    else df.Title[i].str.contains("Manager"):
          ^
SyntaxError: invalid syntax

Upvotes: 2

Views: 1826

Answers (4)

Balaji Ambresh
Balaji Ambresh

Reputation: 5037

Here you go:

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO("""
name             Title
abc          Tech support
xyz          UX designer
ghj          Manager IT
"""), sep='\s{2,}', engine='python')

masks = [df.Title.str.lower().str.contains('support'),
    df.Title.str.lower().str.contains('designer'),
    df.Title.str.lower().str.contains('manager')
]

values = [
    'Support',
    'Design',
    'Management'
]

import numpy as np

df['Category'] = np.select(masks, values, default='Unknown')
print(df)

Output:

  name         Title    Category
0  abc  Tech support     Support
1  xyz   UX designer      Design
2  ghj    Manager IT  Management

Upvotes: 1

foo
foo

Reputation: 71

General syntax of python if statement is:

if test expression:
     Body of if
elif test expression:
     Body of elif
else: 
     Body of else

As you can see in the syntax, to evaluate a test expression, it should be in the if or in the elif construct. The code throws the syntax error as the test expression is placed in the else construct. Consider changing the last else to elif and add a fall back case for error like:

else:
    df.category[i]=="Others"

Upvotes: 1

JCSommer
JCSommer

Reputation: 24

This answer: Iterate through rows and change value should get you going!

Lmk, if you have more questions!

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150745

You can do something like this:

cat_dict = {"Support":"Support", "designer":"Designer", "Manager": "Management"}

df['category'] = (df['Title'].str.extract(fr"\b({'|'.join(cat_dict.keys())})\b")[0]
                    .map(cat_dict)
                 )

Upvotes: 3

Related Questions