C.J
C.J

Reputation: 55

Merge two lists of tuples on the basis of the tuples values

I have two lists ref_list and data_list containing each a tuples with the first element being like a time in second and the second one being a random value as :

ref_list = [(1,value_ref_1),(3,value_ref_3),(4,value_ref_4), ... ]
data_list = [(1,value_dat_1),(2,value_dat_2),(4,value_dat_4), ... ]

I want to compute the difference of the second values as a function of time (first value of tuples). Wich means, a list of tuples which first value would be a time and the second the difference of second values. And I want it to be able to manage missing data in any of the two list using last time ! For the previous example, the result would be :

res_list = [(1,value_dat_1-value_ref_1),(2,value_dat_2-value_ref_1),(3,value_dat_2-value_ref_3),(4,value_dat_4-value_ref_4), ... ]

In this example, the tuple (2,value_dat_2-value_ref_1) was created with tuples (2,value_dat_2) and (1,value_ref_1) because a tuple with 2 as first was missing in ref_list. Same idea the other way around for (3,value_dat_2-value_ref_3)

I can't figure out how to do it with a list comprehension.

I hope I was clear enough.

Thanks a lot.

Upvotes: 1

Views: 715

Answers (2)

Wjars
Wjars

Reputation: 99

Edit 1 : IndexError : if both list have the same length, you shouldn't have an index error. data_list[i] will give the ith element of of data_list, regardless of its content. And when you pop a value, from a python list(), it 'moves' the indexes, so you don't have an index gap (unlike other languages). Or maybe I didn't understand well your concern.

Missing data: yes, yes. So you need to return multiple values in case of a missing one: the upper and the lower bounds

[(elt[0],data_list[i][1]-elt[1]) if data_list[i][0]==elt[0] else ((elt[0],data_list[i][1]-ref_list[i-1][1]),(elt[0],data_list[i][1]-ref_list[i+1][1])) for i,elt in enumerate(ref_list)]

This way, if a value is missing, it'll go search for the previous value and the next value, so you could have the bounds of the missing value. I have no other choice than returning for the 'else' tuples in another structure, 'cause I can return only one 'value' at each turn. ( or face a SyntaxError : invalid syntax at the 'for')

Even if you may need these tuples of tuples (to detect a value is missing), you might want to know another solution - an explicit generator, there.

def generator_stuff(data_list,ref_list):
    for i,elt in enumerate(ref_list):
        if data_list[i][0]==elt[0]:
            yield (elt[0],data_list[i][1]-elt[1])
        else:
            yield (elt[0],data_list[i][1]-ref_list[i-1][1])
            yield (elt[0],data_list[i][1]-ref_list[i+1][1])

I have absolutely no idea of the performance of this, but as it return each tuple individually, you won't have tuples of tuples.

Upvotes: 1

Xevelion
Xevelion

Reputation: 869

Ran the following additionally with two lists with 500k values each, 100mb/200mb (depending on generation parameters) stable memory usage

list_a = [(1,222),(2,444),(5,666),(10,888)]
list_b = [(1,111),(3,333),(7,555),(9,777),(10,888)]

list_c = []

i = 1
a = None
b = None


def get_new(a, for_time):
    if len(a) == 0:
        raise IndexError

    # in the future
    if a[0][0] > for_time:
        return None

    return a.pop(0)

list_a_exhausted = False
list_b_exhausted = False

while True:     
    try:
        a = get_new(list_a,i) or a
    except IndexError:
        list_a_exhausted = True

    try:
        b = get_new(list_b,i) or b  
    except IndexError:
        list_b_exhausted = True

    if list_a_exhausted and list_b_exhausted:
        break

    list_c.append([(i,b[1]-a[1])])  
    i = i + 1

Upvotes: 1

Related Questions