Reputation: 35
Searched multiple threads on here for an answer but nothing is quite like what I am looking to do. I am trying to make a boolean column in a dataframe where one of the parameters is based on metal and associated cost maximum, see Dict below.
Cost_ranges = {'Metals': ["Cu", "Pb", "Zn", "Ni", "Mo", "Co", "Sn", "U3O8", "Ag", "Au", "Pt", "Pd", "Rh", "Os", "Ru", "Ir"],
'Cost Maximum': [350, 200, 200, 500, 800, 1000, 250, 2500, 30, 2500, 500, 1000, 6000, 2500, 2500, 2500]}
The dict is used to complete the below formula:
df_Cost['Total Cost'] >= Cost_ranges['Cost Maximum']
i.e. df_Cost['Metal'] contains to value 'Cu', it calls the 'Cost Maximum' of 350 from the dict and uses this for the boolean expression hence formula would read:
df_Cost[Total Cost] >= 350
I need it to apply to all rows in a dataframe. I have been using df.eval() but need an extra layer of processing to match up the right limit per metal.
I have tried using df.eval(), df.query(), df.loc and df.apply() but keep getting 'TypeError: 'Series' objects are mutable, thus they cannot be hashed' or 'ValueError: ('Lengths must match to compare', (9999,), (16,))' for each solution.
Look forward to the responses.
Upvotes: 1
Views: 186
Reputation: 5456
Depending on your actual data, you could do something like:
import numpy as np
import pandas as pd
Cost_ranges = {'Metals': ["Cu", "Pb", "Zn", "Ni", "Mo", "Co", "Sn", "U3O8", "Ag", "Au", "Pt", "Pd", "Rh", "Os", "Ru", "Ir"],
'Cost Maximum': [350, 200, 200, 500, 800, 1000, 250, 2500, 30, 2500, 500, 1000, 6000, 2500, 2500, 2500]}
N = 20
d = pd.DataFrame({'Metals': np.random.choice(["Cu", "Pb", "Zn", "Ni"], N),
'Cost': np.random.random(N) * 1000})
d.merge(pd.DataFrame(Cost_ranges).astype({'Cost Maximum': float}),
on = "Metals", how = "left")\
.eval('want = Cost > `Cost Maximum`')
# Metals Cost Cost Maximum want
# 0 Cu 297.386007 350.0 False
# 1 Pb 55.570657 200.0 False
# 2 Pb 91.803336 200.0 False
# 3 Cu 916.273995 350.0 True
# 4 Zn 796.383326 200.0 True
# 5 Pb 112.504581 200.0 False
Upvotes: 1
Reputation: 11395
Assuming your df_Cost
looks a little like this (with potentially more rows and columns):
>>> df_Cost
Total Cost Metal
0 315 Cu
1 420 Cu
The easiest way is to use a dictionary to translate from Metal to max cost. Then you need to make Cost_ranges
into a dictionary mapping metal name to cost:
>>> cost_lookup = dict(zip(Cost_ranges['Metals'], Cost_ranges['Cost Maximum']))
>>> cost_lookup
{'Cu': 350, 'Pb': 200, 'Zn': 200, 'Ni': 500, 'Mo': 800, 'Co': 1000, 'Sn': 250, 'U3O8': 2500, 'Ag': 30, 'Au': 2500, 'Pt': 500, 'Pd': 1000, 'Rh': 6000, 'Os': 2500, 'Ru': 2500, 'Ir': 2500}
>>> df_Cost['Metal'].map(cost_lookup)
0 350
1 350
Name: Metal, dtype: int64
>>> df_Cost['Total Cost'] >= df_Cost['Metal'].map(cost_lookup)
0 False
1 True
dtype: bool
Upvotes: 0