Reputation: 101
FYI, performance/speed is not important for this question.
I have an existing pandas dataframe named cost_table
...
+----------+---------+------+-------------------------+-----------------+
| material | percent | qty | price_control_indicator | acct_assign_cat |
+----------+---------+------+-------------------------+-----------------+
| abc111 | 1.00 | 50 | v | # |
| abc222 | 0.25 | 2000 | s | # |
| xyz789 | 0.45 | 0 | v | m |
| def456 | 0.9 | 0 | v | # |
| 123xyz | 0.2 | 0 | v | m |
| lmo888 | 0.6 | 0 | v | m |
+----------+---------+------+-------------------------+-----------------+
I need to add a field cost_source
based on values in multiple fields.
Most answers that come up on google involve a list comprehension or a ternary operator but those only include logic based on a value in one column. For example,
cost_table['cost_source'] = ['map' if qty > 0 else None for qty in cost_table['qty']]
This works based on a value in one column, but I don't know how to expand this to include logic in multiple columns (or if it's even possible?). It also doesn't seem like a very readable/maintainable solution.
I tried using a for in
loop with an if elif
statement but the value in cost_table['cost_source']
remains unchanged and is None
for all rows. But if I print each individual row within my loop then row['cost_source']
has the desired value.
d = {
'material': ['abc111', 'abc222', 'xyz789', 'def456', '123xyz', 'lmo888'],
'percent': [1, .25, .45, .9, .2, .6],
'qty': [50, 2000, 0, 0, 0, 0],
'price_control_indicator': ['v', 's','v', 'v', 'v', 'v'],
'acct_assign_cat': ['#', '#', 'm', '#', 'm', 'm']
}
cost_table = pd.DataFrame(data=d)
cost_table['cost_source'] = None
for index, row in cost_table.iterrows():
if (row['qty'] > 0) or (row['price_control_indicator'] == "s") or (row['acct_assign_cat'] == "#"):
row['cost_source'] = "map"
elif (row['percent'] >= 40) and (row['acct_assign_cat'] == "m"):
row['cost_source'] = "vendor"
else:
row['cost_source'] = None
print(row['cost_source']) # outputs map, vendor, or None as expected
print(cost_table)
Which outputs ...
+----------+---------+------+-------------------------+-----------------+-------------+
| material | percent | qty | price_control_indicator | acct_assign_cat | cost_source |
+----------+---------+------+-------------------------+-----------------+-------------+
| abc111 | 1.00 | 50 | v | # | None |
| abc222 | 0.25 | 2000 | s | # | None |
| xyz789 | 0.45 | 0 | v | m | None |
| def456 | 0.9 | 0 | v | # | None |
| 123xyz | 0.2 | 0 | v | m | None |
| lmo888 | 0.6 | 0 | v | m | None |
+----------+---------+------+-------------------------+-----------------+-------------+
And this is my desired result ...
+----------+---------+------+-------------------------+-----------------+-------------+
| material | percent | qty | price_control_indicator | acct_assign_cat | cost_source |
+----------+---------+------+-------------------------+-----------------+-------------+
| abc111 | 1.00 | 50 | v | # | map |
| abc222 | 0.25 | 2000 | s | # | map |
| xyz789 | 0.45 | 0 | v | m | vendor |
| def456 | 0.9 | 0 | v | # | map |
| 123xyz | 0.2 | 0 | v | m | None |
| lmo888 | 0.6 | 0 | v | m | vendor |
+----------+---------+------+-------------------------+-----------------+-------------+
Upvotes: 1
Views: 3026
Reputation: 75080
If you wish to use np.select
cond1 = cost_table.qty.gt(0) | cost_table.price_control_indicator.eq('s') | cost_table.acct_assign_cat.eq('#')
cond2 = cost_table.percent.ge(0.4) & cost_table.acct_assign_cat.eq('m')
cost_table['cost_source'] = np.select([cond1, cond2], ['map', 'vendor'], default='None')
print(cost_table)
material percent qty price_control_indicator acct_assign_cat cost_source
0 abc111 1.00 50 v # map
1 abc222 0.25 2000 s # map
2 xyz789 0.45 0 v m vendor
3 def456 0.90 0 v # map
4 123xyz 0.20 0 v m None
5 lmo888 0.60 0 v m vendor
Upvotes: 2
Reputation: 1188
As @bazinga stated, use df.apply(lambda x: fun(x)
, but with parameter axis=1
, so the lambda function is applied to row by row (default is column by column).
d = {
'material': ['abc111', 'abc222', 'xyz789', 'def456', '123xyz', 'lmo888'],
'percent': [100, 25, 45, 90, 20, 60],
'qty': [50, 2000, 0, 0, 0, 0],
'price_control_indicator': ['v', 's','v', 'v', 'v', 'v'],
'acct_assign_cat': ['#', '#', 'm', '#', 'm', 'm']
}
cost_table = pd.DataFrame(data=d)
def process_row(row):
if (row['qty'] > 0) or (row['price_control_indicator'] == "s") or (row['acct_assign_cat'] == "#"):
return "map"
elif (row['percent'] >= 40) and (row['acct_assign_cat'] == "m"):
return "vendor"
else:
return None
cost_table['cost_source'] = cost_table.apply(lambda row: process_row(row), axis=1)
print(cost_table)
(I also corrected an inconsistency: in the data procents
should be probably multiplied by 100)
Upvotes: 5