Reputation: 52253

How to merge two lists of dates into a series of date intervals?

I have list of start dates, and a list of end dates. They're sorted...

start_dates = [
    datetime.date(2009, 11, 5), datetime.date(2009, 11, 13), 
    datetime.date(2009, 11, 25), datetime.date(2009, 11, 26), 
    datetime.date(2009, 12, 4), datetime.date(2009, 12, 7), 
    datetime.date(2009, 12, 29), datetime.date(2009, 12, 30)]

end_dates = [
    datetime.date(2009, 10, 1), datetime.date(2009, 10, 2), 
    datetime.date(2009, 10, 9), datetime.date(2009, 10, 12), 
    datetime.date(2009, 11, 4), datetime.date(2009, 12, 14), 
    datetime.date(2009, 12, 15)]

The start dates represent dates on which we received advice to buy a share. The end dates are dates on which which we received advice to sell it. The sources of advice are different and we are backtesting what would happen if we used the buy advice from one source, but the sell advice from another. Thus we have two sequences of dates which we want to resolve into pairs - or intervals - over which we would have held a position in the stock.

Thus we'll take a date from the start_dates to decide when to buy the stock: on Nov 5 we buy a position. Then we cycle through the end_dates looking for the first advice to sell it: Dec 14. And repeat, buying when we don't hold a position when advised to from one source, and selling when we do hold a position from the other source.

You might say we want to flap between which of two lists we're looping over.

Thus the input above generates:

result = (
  (datetime.date(2009, 11, 5), datetime.date(2009, 12, 14)),
  (datetime.date(2009, 12, 29), None)
)

I'm using for-loops inside for-loops and wonder if there is not a better way. Performance is of interest since it will be applied to thousands of scenarios over a 40 year span; some of the lists involve thousands of dates.

Upvotes: 4

Answers (4)

John Mee

Reputation: 52253

I've finally nailed it down to:

    trades = []
    enddate = datetime.date(1900, 1, 1)
    try:
        for startdate in startdates:
            if enddate <= startdate:
                enddate = enddates.next()
                while enddate <= startdate:
                    enddate = enddates.next()
                trades.append((startdate, enddate))
    except StopIteration:
        trades.append((startdate, None))

Thanks to those who asked questions and answered. For no rational reason this little puzzle became a fixation for me but I think finally I have done this to death and should move on with my life. It's really very simple in the end - astonishing how much work it took to make it this plain!

Upvotes: 3

thegauraw

Reputation: 5708

I think this should work to get the date tuple, but I cant suggest you the way without using for loop, because it might get more complicated instead.

The logic is pretty simple and obvious though.

result = []
for startd in start_dates:
    if not result or result[-1][1] is not None and startd>result[-1][1]:
    #can use 'len(result)==0' instead of 'not result'
        for endd in end_dates:
            if endd>startd:
                result.append((startd,endd))
                break
        else:
            result.append((start,None))
    if result[-1][1] is None:
        break

result = tuple(result)
print result

Upvotes: 1

Avaris

Reputation: 36715

Edit

This should scale with len(start_dates)+len(end_dates):

def date_range(start_dates, end_dates):
    result = []

    start_i = 0
    end_i = 0

    while start_i<len(start_dates):
        while end_i<len(end_dates) and start_dates[start_i]>end_dates[end_i]:
            end_i += 1
        if end_i == len(end_dates):
            result.append((start_dates[start_i], None))
            break
        result.append((start_dates[start_i], end_dates[end_i]))
        while start_i<len(start_dates) and start_dates[start_i]<=end_dates[end_i]:
            start_i += 1
        end_i += 1

    return result

Usage:

In  : start_dates = [
   ....:     datetime.date(2009, 11, 5), datetime.date(2009, 11, 13),
   ....:     datetime.date(2009, 11, 25), datetime.date(2009, 11, 26),
   ....:     datetime.date(2009, 12, 4), datetime.date(2009, 12, 7),
   ....:     datetime.date(2009, 12, 29), datetime.date(2009, 12, 30)]

In : end_dates = [
   ....:     datetime.date(2009, 10, 1), datetime.date(2009, 10, 2),
   ....:     datetime.date(2009, 10, 9), datetime.date(2009, 10, 12),
   ....:     datetime.date(2009, 11, 4), datetime.date(2009, 12, 14),
   ....:     datetime.date(2009, 12, 15)]

In : date_range(start_dates, end_dates)
Out:
[(datetime.date(2009, 11, 5), datetime.date(2009, 12, 14)),
 (datetime.date(2009, 12, 29), None)]

In : start_dates = [
   ....:     datetime.date(2009, 11, 5), datetime.date(2009, 11, 13),
   ....:     datetime.date(2009, 11, 25), datetime.date(2009, 11, 26),
   ....:     datetime.date(2009, 12, 4), datetime.date(2009, 12, 7),
   ....:     datetime.date(2009, 12, 29), datetime.date(2009, 12, 30)]

In : end_dates = [
   ....:     datetime.date(2009, 10, 1), datetime.date(2009, 10, 2),
   ....:     datetime.date(2009, 10, 9), datetime.date(2009, 10, 12),
   ....:     datetime.date(2009, 11, 7), datetime.date(2009, 12, 14), # changed (2009, 11, 4) -> (2009, 11, 7)
   ....:     datetime.date(2009, 12, 15)]

In : date_range(start_dates, end_dates)
Out:
[(datetime.date(2009, 11, 5), datetime.date(2009, 11, 7)),
 (datetime.date(2009, 11, 13), datetime.date(2009, 12, 14)),
 (datetime.date(2009, 12, 29), None)]

Upvotes: 2

Pablo Jomer

Reputation: 10378

How about this.

all_dates = start_dates.expand(end_dates)
all_dates.sort()

look_for_start = 1;
date = []
start_date = None
end_date = None
for i in range(len(all_dates)):
   if look_for_start and all_dates[i] in start_dates:
     start_date = all_dates[i]
     look_for_start = 0;

   elsif !look_for_start and all_dates[i] in end_dates:
     end_date = all_dates[1]
     look_for_start = 1;

   if start_date == end_date:
     end_date == None
     look_for_start = 0;

   if start_date != None and end_date != None;
     date.append((start_date,end_date))
     start_date = None
     end_date = None

After this you have start_dates with end dates as far as possible. just take the remaining set of start_dates and get set their end date to None

Upvotes: 0

How to merge two lists of dates into a series of date intervals?

Answers (4)

Related Questions