Stefan Stoychev
Stefan Stoychev

Reputation: 5012

Pandas find value in interval

In pandas if I have transaction data in dataframe (transdf) that looks like this:

OrderId, ShippmentSegmentsDays
1      , 1
2      , 3
3      , 4
4      , 10

And I have another df (segmentdf) that specify intervals:

ShippmentSegmentDaysStart , ShippmentSegmentDaysEnd , ShippmentSegment
-9999999                  , 0                       , 'On-Time'
0                         , 1                       , '1 day late'
1                         , 2                       , '2 days late'
2                         , 3                       , '3 days late'
3                         , 9999999                 , '>3 days late'

And I need to add one more column which is based on "ShippmentSegmentsDays" and "ShippmentSegment". So basically for each row from "transdf" I need to check "ShippmentSegmentsDays" value in which interval can be found from "segmentdf"

As a result the "transdf" should look like this:

OrderId, ShippmentSegmentsDays, ShippmentSegment
1      , 1                    , '1 day late'
2      , 0                    , 'On-Time'
3      , 4                    , '>3 days late'
4      , 10                   , '>3 days late'

Can anyone give me an advice how this situation can be handled?

Thanks! Stefan

Upvotes: 1

Views: 3113

Answers (2)

Moritz
Moritz

Reputation: 97

Old post, but I had the same issue. Pandas offers an Interval function that worked for me.

Upvotes: 1

alacy
alacy

Reputation: 5074

You can use pandas.apply(args) to apply a function to each row in the transdf data frame if you know that the rules set in the segmentdf are static and don't change. Perhaps the following code snippet may help you. I haven't tested this so be wary, but I think it should get you started in the right direction.

# create a series of just the data from the 'ShippmentSegmentDays' column
seg_days_df = trends['ShippmentSegmentDays']

# Create a new column, 'ShippmentSegment', in 'transdf' data frame by calling
# our utility function on the series created above.
transdf['ShippmentSegment'] = seg_days_df.apply(calc_ship_segment, axis=1)

# Utility function to define the rules set in the 'segmentdf' data frame
def calc_ship_segment(num):
     if not num:
         return 'On Time'
     elif num == 1:
         return '1 Day Late'
     elif num == 2:
         return '2 Days Late'
     elif num == 3:
         return '3 Days Late'
     else:
         return '>3 Days Late'

Upvotes: 2

Related Questions