Reputation: 179
Suppose I have a dataframe containing a column of probability. Now I create a map function which returns 1 if the probability is greater than a threshold value, otherwise returns 0. Now the catch is that I want to specify the threshold by giving it as an argument to the function, and then mapping it on the pandas dataframe.
Take the code example below:
def partition(x,threshold):
if x<threshold:
return 0
else:
return 1
df = pd.DataFrame({'probability':[0.2,0.8,0.4,0.95]})
df2 = df.map(partition)
My question is, how would the last line work, i.e. how do I pass the threshold value inside my map function?
Upvotes: 9
Views: 8324
Reputation: 23271
If there are extra arguments, it's better to use apply()
:
df['new'] = df['probability'].apply(partition, threshold=0.5)
or wrap the function with functools.partial
and map this new function:
from functools import partial
df['new'] = df['probability'].map(partial(partition, threshold=0.5))
# a bit more legibly
partition_05 = partial(partition, threshold=0.5)
df['new'] = df['probability'].map(partition_05)
You can pass the extra argument as a kwarg to applymap()
too:
df = df.applymap(partition, threshold=0.5)
That said, please use vectorized code wherever possible. For example, in the OP,
df['new'] = (df['probability'] > 0.5) * 1
produces the desired column.
Upvotes: 1
Reputation: 30930
We can use Dataframe.applymap
df2 = df.applymap(lambda x: partition(x, threshold=0.5))
Or if only one column:
df['probability']=df['probability'].apply(lambda x: partition(x, threshold=0.5))
but it is not neccesary here. You can do:
df2 = df.ge(threshold).astype(int)
I recommend you see it
Upvotes: 9
Reputation: 13397
You can use lambda
for that purpose:
def partition(x,threshold):
if x<threshold:
return 0
else:
return 1
df=pd.DataFrame({'probability':[0.2,0.8,0.4,0.95]})
df['probability']=df['probability'].map(lambda x: partition(x, threshold=0.5))
Upvotes: 3