RAHenriksen
RAHenriksen

Reputation: 163

Select element in list of values in dictionary

I have a dictionary with the values being list with 2 or more elements. I want to slice in those values with more than 2 elements based on the other values with only 2 elements in their list.

I know i can append all the value elements to a list and then just count the most common, but i need to keep the information regarding the key and the dictionary format, so it doesnt really work. And i cant seem to figure out how to attack this problem.

My dictionary is like this

start_dict = {
    'Key1': [243928620, 243938319],
    'Key2': [243928620, 243938319],
    'Key3': [243928620, 243931757, 243938319],
    'Key4': [243928620, 243938319, 243938323],
    'Key5': [243928634, 243938316],
    'Key6': [243928620, 243938319],
    'Key7': [243928634, 243938317],
    'Key8': [243928620, 243938329,243938387]
}

I want to keep element 1 in all of the value list unaltered as it's a start coordinate, and the rest is potential end coordinates for a given interval.

Then for those values with more than 2 elements in their list (key3, 4 and 8) i want to keep that element in their list of values which is most frequent in the other value lists for the other keys, which is the case for key3 and 4, as they both contain the most frequent end coordinate of 243938319.

If they are not present in any of the other i'll just keep them, which is the case of key 8.

The values most frequent of all the keys are for the start position 243928620 and for the end postion 243938319. So the ideal output would be

start_dict = {
    'Key1': [243928620, 243938319],
    'Key2': [243928620, 243938319],
    'Key3': [243928620, 243938319],
    'Key4': [243928620, 243938319],
    'Key5': [243928634, 243938316],
    'Key6': [243928620, 243938319],
    'Key7': [243928634, 243938317],
    'Key8': [243928620, 243938329,243938387]
}

I cant seem to get my head around how this could be done, if it even can be done in a smart way.

Would any of you be able to help? Thanks for your time.

Upvotes: 1

Views: 6445

Answers (3)

Paritosh Singh
Paritosh Singh

Reputation: 6246

With regards to a different structure for storing this information:

start_dict = {
    'Key1': [243928620, 243938319],
    'Key2': [243928620, 243938319],
    'Key3': [243928620, 243931757, 243938319],
    'Key4': [243928620, 243938319, 243938323],
    'Key5': [243928634, 243938316],
    'Key6': [243928620, 243938319],
    'Key7': [243928634, 243938317],
    'Key8': [243928620, 243938329,243938387]
}

modified_dict = {k:{"start":v[0], "end":v[1:]} for k, v in start_dict.items()}
print(modified_dict)
#Output:
{'Key1': {'start': 243928620, 'end': [243938319]},
 'Key2': {'start': 243928620, 'end': [243938319]},
 'Key3': {'start': 243928620, 'end': [243931757, 243938319]},
 'Key4': {'start': 243928620, 'end': [243938319, 243938323]},
 'Key5': {'start': 243928634, 'end': [243938316]},
 'Key6': {'start': 243928620, 'end': [243938319]},
 'Key7': {'start': 243928634, 'end': [243938317]},
 'Key8': {'start': 243928620, 'end': [243938329, 243938387]}}

A dict of dicts like above may present a clearer picture to both use and maintain, you can consider using a structure like this. Alternatively, perhaps a 2 length tuple can work as well, but i find this version the most readable.

Taking this as a starting point:

#collect all possible end points for every key, and combine in a list
end_points = []
for k, v in modified_dict.items():
    end_points.extend(v["end"])

#find the most common end point
from collections import Counter
most_common = Counter(end_points).most_common(1)[0][0]

#Adjust the end points if the most common end point is found
for k, v in modified_dict.items():
    if most_common in v["end"]:
        modified_dict[k]["end"] = [most_common]
print(modified_dict)
#Output:
{'Key1': {'start': 243928620, 'end': [243938319]},
 'Key2': {'start': 243928620, 'end': [243938319]},
 'Key3': {'start': 243928620, 'end': [243938319]},
 'Key4': {'start': 243928620, 'end': [243938319]},
 'Key5': {'start': 243928634, 'end': [243938316]},
 'Key6': {'start': 243928620, 'end': [243938319]},
 'Key7': {'start': 243928634, 'end': [243938317]},
 'Key8': {'start': 243928620, 'end': [243938329, 243938387]}}

Upvotes: 1

Born Tbe Wasted
Born Tbe Wasted

Reputation: 610

I prefer the other answer, but this can still teach you a few things about list comprehension.

#

create a dic of list of all the endpoints:

startpoints = {k:v[0]  for k,v in start_dict.items()}
endpoints = {k:v[1:] for k,v in start_dict.items()}

Then flatten it:

endpoints_flatten = [value for list in endpoints.values() for value in list]

create a counter that has all the endpoints:

from collections import Counter
c = Counter(endpoints_flatten)

Create a function that gives you the most common in a list:

def most_com(list_endpoint,c):
    return max(list_endpoint, key=lambda l : c[l])

Now go through the list of endpoints, and only keep the most common one :

common_endpoint = {k:most_com(list_endpoint,c) for k,list_endpoint in endpoints.items()}

Now output all of it:

output = {k:v + common_endpoint[k] for k,v in startpoints.items()}

Upvotes: 0

javidcf
javidcf

Reputation: 59681

This is a way to do that:

from collections import Counter
from pprint import pprint

def reduce_coords(data):
    # Counter of second list element for 2-element lists
    count = Counter(v[1] for v in data.values() if len(v) == 2)
    # Result dict
    result = {}
    # Iterate data entries
    for k, v in data.items():
        # Modify lists longer than two with at least one element in the counter
        if len(v) > 2 and any(elem in count for elem in v[1:]):
            # Replace list with first element and following element with max count
            v = [v[0], max(v[1:], key=lambda elem: count.get(elem, 0))]
        # Add to result
        result[k] = v
    return result

start_dict = {
    'Key1': [243928620, 243938319],
    'Key2': [243928620, 243938319],
    'Key3': [243928620, 243931757, 243938319],
    'Key4': [243928620, 243938319, 243938323],
    'Key5': [243928634, 243938316],
    'Key6': [243928620, 243938319],
    'Key7': [243928634, 243938317],
    'Key8': [243928620, 243938329,243938387]
}
pprint(reduce_coords(start_dict))
# {'Key1': [243928620, 243938319],
#  'Key2': [243928620, 243938319],
#  'Key3': [243928620, 243938319],
#  'Key4': [243928620, 243938319],
#  'Key5': [243928634, 243938316],
#  'Key6': [243928620, 243938319],
#  'Key7': [243928634, 243938317],
#  'Key8': [243928620, 243938329, 243938387]}

Upvotes: 2

Related Questions