Reputation: 327
I'm trying to create an if statement that populates the tool type column with Legal
if the Pay Category column is Expense
.
However, it is flaging everything that contains Legal
as Legal
, regardless of Pay Category.
test={"Pay Category":["Indemnity","Indemnity","Indemnity","Indemnity","Expense","Expense","Expense","Medical"],"Description of Payment":["Legal","Legal","Legal","Legal","Legal","Legal","Frog","Legal",]}
test=pd.DataFrame(test)
test["Tool Type"]=""
if (test["Pay Category"]=="Medical") is not False:
test["Tool Type"][test["Description of Payment"].str.contains("Pharmacy|Prescription|RX",case=False)]="Pharmacy"
if (test["Pay Category"]=='Expense') is not False:
test["Tool Type"][test["Description of Payment"].str.contains("Legal|Attorney|Court|Defense",case=False)]="Legal"
My understanding is, if (test["Pay Category"]=='Expense') is not False:
is a boolean value which is either True or False
, it should only execute the if statement if the criteria " is not False" is met. What am I missing?
Brandon
Upvotes: 1
Views: 73
Reputation: 863226
I think you need add conditions and chain them with &
(and
):
test["Tool Type"]=""
m1 = test["Description of Payment"].str.contains("Pharmacy|Prescription|RX",case=False)
m2 = test["Pay Category"]=="Medical"
m3 = test["Description of Payment"].str.contains("Legal|Attorney|Court|Defense",case=False)
m4 = test["Pay Category"]=="Expense"
test.loc[m1 & m2, "Tool Type"]="Pharmacy"
test.loc[m3 & m4, "Tool Type"]="Legal"
print (test)
Description of Payment Pay Category Tool Type
0 Legal Indemnity
1 Legal Indemnity
2 Legal Indemnity
3 Legal Indemnity
4 Legal Expense Legal
5 Legal Expense Legal
6 Frog Expense
7 Legal Medical
Another solution with double numpy.where
:
test['Tool Type'] = np.where(m1 & m2, 'Pharmacy',
np.where(m3 & m4, 'Legal', ''))
print (test)
Description of Payment Pay Category Tool Type
0 Legal Indemnity
1 Legal Indemnity
2 Legal Indemnity
3 Legal Indemnity
4 Legal Expense Legal
5 Legal Expense Legal
6 Frog Expense
7 Legal Medical
EDIT: Very nice solution with unutbu
comment is use numpy.select
test['Tool Type'] = np.select([(m1 & m2), (m3 & m4)], ['Pharmacy', 'Legal'], default='')
print (test)
Description of Payment Pay Category Tool Type
0 Legal Indemnity
1 Legal Indemnity
2 Legal Indemnity
3 Legal Indemnity
4 Legal Expense Legal
5 Legal Expense Legal
6 Frog Expense
7 Legal Medical
Upvotes: 3