Reputation: 1225
I have a data frame that looks like this:
name Title
abc 'Tech support'
xyz 'UX designer'
ghj 'Manager IT'
... ....
I want to iterate through the data frame and using df.str.contains
make another column that will categorize those jobs. There are 8 categories.
The output will be :
name Title category
abc 'Tech support' 'Support'
xyz 'UX designer' 'Design'
ghj 'Manager IT' 'Management'
... .... ....
here's what I've tried so far:
for i in range(len(df)):
if df.Title[i].str.contains("Support"):
df.category[i]=="Support"
elif df.Title[i].str.contains("designer"):
df.category[i]=="Design"
else df.Title[i].str.contains("Manager"):
df.category[i]=="Management"
of course , I'm a noob at programming and this throws the error:
File "<ipython-input-29-d9457f9cb172>", line 6
else df.Title[i].str.contains("Manager"):
^
SyntaxError: invalid syntax
Upvotes: 2
Views: 1826
Reputation: 5037
Here you go:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("""
name Title
abc Tech support
xyz UX designer
ghj Manager IT
"""), sep='\s{2,}', engine='python')
masks = [df.Title.str.lower().str.contains('support'),
df.Title.str.lower().str.contains('designer'),
df.Title.str.lower().str.contains('manager')
]
values = [
'Support',
'Design',
'Management'
]
import numpy as np
df['Category'] = np.select(masks, values, default='Unknown')
print(df)
Output:
name Title Category
0 abc Tech support Support
1 xyz UX designer Design
2 ghj Manager IT Management
Upvotes: 1
Reputation: 71
General syntax of python if statement is:
if test expression:
Body of if
elif test expression:
Body of elif
else:
Body of else
As you can see in the syntax, to evaluate a test expression, it should be in the if or in the elif construct. The code throws the syntax error as the test expression is placed in the else construct. Consider changing the last else to elif and add a fall back case for error like:
else:
df.category[i]=="Others"
Upvotes: 1
Reputation: 24
This answer: Iterate through rows and change value should get you going!
Lmk, if you have more questions!
Upvotes: 1
Reputation: 150745
You can do something like this:
cat_dict = {"Support":"Support", "designer":"Designer", "Manager": "Management"}
df['category'] = (df['Title'].str.extract(fr"\b({'|'.join(cat_dict.keys())})\b")[0]
.map(cat_dict)
)
Upvotes: 3