WoolyFox
WoolyFox

Reputation: 35

Calling values from dict to create boolean column in dataframe

Searched multiple threads on here for an answer but nothing is quite like what I am looking to do. I am trying to make a boolean column in a dataframe where one of the parameters is based on metal and associated cost maximum, see Dict below.

Cost_ranges = {'Metals': ["Cu", "Pb", "Zn", "Ni", "Mo", "Co", "Sn", "U3O8", "Ag", "Au", "Pt", "Pd", "Rh", "Os", "Ru", "Ir"],
               'Cost Maximum': [350, 200, 200, 500, 800, 1000, 250, 2500, 30, 2500, 500, 1000, 6000, 2500, 2500, 2500]}

The dict is used to complete the below formula:

df_Cost['Total Cost'] >= Cost_ranges['Cost Maximum']

i.e. df_Cost['Metal'] contains to value 'Cu', it calls the 'Cost Maximum' of 350 from the dict and uses this for the boolean expression hence formula would read:

df_Cost[Total Cost] >= 350

I need it to apply to all rows in a dataframe. I have been using df.eval() but need an extra layer of processing to match up the right limit per metal.

I have tried using df.eval(), df.query(), df.loc and df.apply() but keep getting 'TypeError: 'Series' objects are mutable, thus they cannot be hashed' or 'ValueError: ('Lengths must match to compare', (9999,), (16,))' for each solution.

Look forward to the responses.

Upvotes: 1

Views: 186

Answers (2)

r.user.05apr
r.user.05apr

Reputation: 5456

Depending on your actual data, you could do something like:

import numpy as np
import pandas as pd

Cost_ranges = {'Metals': ["Cu", "Pb", "Zn", "Ni", "Mo", "Co", "Sn", "U3O8", "Ag", "Au", "Pt", "Pd", "Rh", "Os", "Ru", "Ir"],
           'Cost Maximum': [350, 200, 200, 500, 800, 1000, 250, 2500, 30, 2500, 500, 1000, 6000, 2500, 2500, 2500]}

N = 20
d = pd.DataFrame({'Metals': np.random.choice(["Cu", "Pb", "Zn", "Ni"], N),
                  'Cost': np.random.random(N) * 1000})

d.merge(pd.DataFrame(Cost_ranges).astype({'Cost Maximum': float}),
        on = "Metals", how = "left")\
    .eval('want = Cost > `Cost Maximum`')

#    Metals        Cost  Cost Maximum   want
# 0      Cu  297.386007         350.0  False
# 1      Pb   55.570657         200.0  False
# 2      Pb   91.803336         200.0  False
# 3      Cu  916.273995         350.0   True
# 4      Zn  796.383326         200.0   True
# 5      Pb  112.504581         200.0  False

Upvotes: 1

Cimbali
Cimbali

Reputation: 11395

Assuming your df_Cost looks a little like this (with potentially more rows and columns):

>>> df_Cost
   Total Cost Metal
0         315    Cu
1         420    Cu

The easiest way is to use a dictionary to translate from Metal to max cost. Then you need to make Cost_ranges into a dictionary mapping metal name to cost:

>>> cost_lookup = dict(zip(Cost_ranges['Metals'], Cost_ranges['Cost Maximum']))
>>> cost_lookup
{'Cu': 350, 'Pb': 200, 'Zn': 200, 'Ni': 500, 'Mo': 800, 'Co': 1000, 'Sn': 250, 'U3O8': 2500, 'Ag': 30, 'Au': 2500, 'Pt': 500, 'Pd': 1000, 'Rh': 6000, 'Os': 2500, 'Ru': 2500, 'Ir': 2500}
>>> df_Cost['Metal'].map(cost_lookup)
0    350
1    350
Name: Metal, dtype: int64
>>> df_Cost['Total Cost'] >= df_Cost['Metal'].map(cost_lookup)
0    False
1     True
dtype: bool

Upvotes: 0

Related Questions