Learner
Learner

Reputation: 91

How to zip two lists with duplicate elements?

There are two lists

l1 = ['k1','k2','k3','k3','k4', 'k5']
l2 = ["1.2.3","abc-2.3.4","xyz-def-5.6.8", "xyz-def-5.6.7","ghjb-5.6.7","7.8.9"]

I need to get these items as key:value pair in a dictionary, along with highest value of duplicate elements. Since dictionary holds the unique keys, one of the duplicate elements will be overridden.

print(dict(zip(l1, l2)))
{'k1': '1.2.3', 'k2': 'abc-2.3.4', 'k3': 'xyz-def-5.6.7', 'k4': 'ghjb-5.6.7', 'k5': '7.8.9'}

but from above output, i need highest value xyz-def-5.6.8 instead of xyz-def-5.6.7

Tried, print(list(zip(l1, l2))), output as below

[('k1', '1.2.3'), ('k2', 'abc-2.3.4'), ('k3', 'xyz-def-5.6.8'), ('k3', 'xyz-def-5.6.7'), ('k4', 'ghjb-5.6.7'), ('k5', '7.8.9')]

How do I achieve it ?

Is it possible to format this list of tuples or any other way to get desired ?

l1 = ['k1','k2','k3','k3','k4', 'k5', 'k6', 'k7', 'k6']
l2 = ["1.2.3","abc-2.3.4","xyz-def-5.6.8", "xyz-def-5.6.7","ghjb-5.6.7","7.8.9", "1:2.3.4-3ubuntu0.1", "1.2.3-1.2build3", "1:2.3.4-3ubuntu0.2"]

These can't be same format across all the keys but it can be same across the certain duplicate keys, Say k6 has one format, k3 has another format.

Upvotes: 1

Views: 709

Answers (2)

Learner
Learner

Reputation: 91

Since, format for duplicate key remains same and different for across other keys. Used below method to get the duplicated key details and then sorting on them, retaining highest value.

l1 = ['k1','k2','k3','k3','k4', 'k5', 'k6', 'k7', 'k6']
l2 = ["1.2.3","abc-2.3.4","xyz-def-5.6.8", "xyz-def-5.6.7","ghjb-5.6.7","7.8.9", "1:2.3.4-3ubuntu0.1", "1.2.3-1.2build3", "1:2.3.4-3ubuntu0.2"]
l3 = {}

def get_duplicates_details(list_of_elems):
    test = {}
    for index, value in enumerate(list_of_elems):
        if value in test:
            test[value].append(index)
        else:
            test[value] = [index]

    dictOfElems = {key: value for key, value in test.items() if len(value) > 1}
    return dictOfElems

dictOfElems = get_duplicates_details(l1)
print(dictOfElems)

for index2, value2 in enumerate(l1):
    if value2 in dictOfElems:
        tmp = [l2[j] for j in dictOfElems[value2]]
        tmp.sort()
        l3[value2] = tmp[-1]
    else:
        l3[value2] = l2[index2]

print(l3)

Output:

{'k3': [2, 3], 'k6': [6, 8]}
{'k1': '1.2.3', 'k2': 'abc-2.3.4', 'k3': 'xyz-def-5.6.8', 'k4': 'ghjb-5.6.7', 'k5': '7.8.9', 'k6': '1:2.3.4-3ubuntu0.2', 'k7': '1.2.3-1.2build3'}

Upvotes: 0

Patrick Artner
Patrick Artner

Reputation: 51643

You need some way to "tell" your dict which value to choose if the key already exists - and it has to know how to decide between two values.

i need highest value xyz-def-5.6.8 instead of xyz-def-5.6.7

The provided function prioritize implements that.

You could f.e. do this:

l1 = ['k1','k2','k3','k3','k4', 'k5']
l2 = ["1.2.3","abc-2.3.4","xyz-def-5.6.8", "xyz-def-5.6.7","ghjb-5.6.7","7.8.9"]

def prioritize(a,b):
    """Split the data by -, take the last, split it by . and convert to int tuple
    for comparison reasons. Take either a or b depending wich is bigger."""
    def extract(what):
        """Split into int tuples"""
        return tuple(map(int, (what.split("-")[-1]).split(".")))

    # 'xyz-def-5.6.8' => (5,6,8)
    a_num = extract(a)

    # 'xyz-def-5.5.7 => (5,5,7)
    b_num = extract(b)

    # int tuple comparison "just works"
    return a if a_num > b_num else b

d = {}
for (k,v) in zip(l1,l2):
    # maybe keep old value, else use new value
    d[k] = prioritize(d.get(k,v), v)

print(d)

Output:

{'k1': '1.2.3', 
 'k2': 'abc-2.3.4', 
 'k3': 'xyz-def-5.6.8', 
 'k4': 'ghjb-5.6.7', 
 'k5': '7.8.9'}

Upvotes: 1

Related Questions