Reputation: 5012
In pandas if I have transaction data in dataframe (transdf) that looks like this:
OrderId, ShippmentSegmentsDays
1 , 1
2 , 3
3 , 4
4 , 10
And I have another df (segmentdf) that specify intervals:
ShippmentSegmentDaysStart , ShippmentSegmentDaysEnd , ShippmentSegment
-9999999 , 0 , 'On-Time'
0 , 1 , '1 day late'
1 , 2 , '2 days late'
2 , 3 , '3 days late'
3 , 9999999 , '>3 days late'
And I need to add one more column which is based on "ShippmentSegmentsDays" and "ShippmentSegment". So basically for each row from "transdf" I need to check "ShippmentSegmentsDays" value in which interval can be found from "segmentdf"
As a result the "transdf" should look like this:
OrderId, ShippmentSegmentsDays, ShippmentSegment
1 , 1 , '1 day late'
2 , 0 , 'On-Time'
3 , 4 , '>3 days late'
4 , 10 , '>3 days late'
Can anyone give me an advice how this situation can be handled?
Thanks! Stefan
Upvotes: 1
Views: 3113
Reputation: 97
Old post, but I had the same issue. Pandas offers an Interval function that worked for me.
Upvotes: 1
Reputation: 5074
You can use pandas.apply(args)
to apply a function to each row in the transdf
data frame if you know that the rules set in the segmentdf
are static and don't change. Perhaps the following code snippet may help you. I haven't tested this so be wary, but I think it should get you started in the right direction.
# create a series of just the data from the 'ShippmentSegmentDays' column
seg_days_df = trends['ShippmentSegmentDays']
# Create a new column, 'ShippmentSegment', in 'transdf' data frame by calling
# our utility function on the series created above.
transdf['ShippmentSegment'] = seg_days_df.apply(calc_ship_segment, axis=1)
# Utility function to define the rules set in the 'segmentdf' data frame
def calc_ship_segment(num):
if not num:
return 'On Time'
elif num == 1:
return '1 Day Late'
elif num == 2:
return '2 Days Late'
elif num == 3:
return '3 Days Late'
else:
return '>3 Days Late'
Upvotes: 2