Reputation: 2556
Edited
I'm sorry I didn't post it right the first time. The solutions suggested worked if there were only two entries of each Lead ID
with Lead Status
"A". I am changing my data. I again apologize.
Data:
Lead ID Lead Status Duration Target
1-1H9C0XL Too Small -0.466177 1
1-1H9G33C A -0.620709 0
1-1H9G33C A -0.500709 0
1-1H9G33C A 0.337401 0
4-1HFORF8 No Fit -0.343840 1
4-1HFSXOG No Fit -0.124920 1
4-1HLQ2IJ A -0.330962 0
4-1HLQ2IJ A 0.130818 0
4-1HLQ2IJ A -0.400817 0
4-1HLQ2IJ A 0.240818 0
I want to accomplish following:
If there is a duplicate in the Lead ID
and Lead Status
, make all the Target
values "1" for that LeadID
with shorter Duration
.
Desired Output
Lead ID Lead Status Duration Target
1-1H9C0XL Too Small -0.466177 1
1-1H9G33C A -0.620709 1
1-1H9G33C A -0.500709 1
1-1H9G33C A 0.337401 0
4-1HFORF8 No Fit -0.343840 1
4-1HFSXOG No Fit -0.124920 1
4-1HLQ2IJ A -0.330962 1
4-1HLQ2IJ A 0.130818 1
4-1HLQ2IJ A -0.400817 1
4-1HLQ2IJ A 0.240818 0
I am not able to implement a condition of checking for duplicates and the value in duration to update the last column. I appreciate any assistance a lot.
Upvotes: 0
Views: 165
Reputation: 61947
Here is an idiomatic and performant answer.
df['Target'] += df.sort_values('Duration')\
.duplicated(subset=['Lead ID', 'Lead Status'], keep='last')
If you don't assume unique rows have a 1 then you can do the following.
df1 = df.sort_values('Duration')
unique = ~df1.duplicated(subset=['Lead ID', 'Lead Status'], keep=False) * 1
first = df1.duplicated(subset=['Lead ID', 'Lead Status'], keep='last') * 1
df['Target'] = unique + first
And a less performant way:
df.groupby(['Lead ID', 'Lead Status'])['Duration']\
.transform(lambda x: 1 if len(x) == 1 else x < x.max())
Lead ID Lead Status Duration Target
0 1-1H9C0XL Too Small -0.466177 1
1 1-1H9G33C A -0.620709 1
2 1-1H9G33C A -0.500709 0
3 1-1H9G33C A 0.337401 1
4 4-1HFORF8 No Fit -0.343840 1
5 4-1HFSXOG No Fit -0.124920 1
6 4-1HLQ2IJ A -0.330962 1
7 4-1HLQ2IJ A 0.130818 1
8 4-1HLQ2IJ A -0.400817 1
9 4-1HLQ2IJ A 0.240818 0
Upvotes: 0
Reputation: 323226
Try this(assuming your df is sorted)
df.loc[df[df.duplicated(['LeadID','LeadStatus'],keep=False)].drop_duplicates(['LeadID','LeadStatus'],keep='first').index,'Target']=1
df
Out[895]:
LeadID LeadStatus Duration Target
0 1-1H9C0XL TooSmall -0.466 1
1 1-1H9G33C A -0.621 1
2 1-1H9G33C A 0.337 0
3 4-1HFORF8 NoFit -0.344 1
4 4-1HFSXOG NoFit -0.125 1
5 4-1HLQ2IJ A -0.331 1
6 4-1HLQ2IJ A 0.241 0
Update
df=df.sort_values(['LeadID','LeadStatus','Duration'])
df.loc[df[df.duplicated(['LeadID','LeadStatus'],keep='last')].index,'Target']=1
Out[911]:
LeadID LeadStatus Duration Target
0 1-1H9C0XL TooSmall -0.466 1
1 1-1H9G33C A -0.621 1
2 1-1H9G33C A -0.501 1
3 1-1H9G33C A 0.337 0
4 4-1HFORF8 NoFit -0.344 1
5 4-1HFSXOG NoFit -0.125 1
8 4-1HLQ2IJ A -0.401 1
6 4-1HLQ2IJ A -0.331 1
7 4-1HLQ2IJ A 0.131 1
9 4-1HLQ2IJ A 0.241 0
Upvotes: 1