Reputation: 55

Filter dictionary of tuples with duplicated first element of the values based on some condition

Sample input data in which Value1 is duplicated across several keys

{'Key1': ('Value1', '28.302', '30', '131', '10', '321'), 
 'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'), 
 'Key3': ('Value1', '34.048', '4', '375', '22', '1275'), 
 'Key4': ('Value3', '47.561', '82', '159', '901', '1146'), 
 'Key5': ('Value1', '35.821', '214', '279', '82', '282')}

Desired results in which key1 and key3 have been filtered out because the second element of Value1 is the highest in Key5.

{'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'), 
 'Key4': ('Value3', '47.561', '82', '159', '901', '1146'), 
 'Key5': ('Value1', '35.821', '214', '279', '82', '282')}

My attempts thus far have failed and they are probably useless to post them here!

Upvotes: 0

Answers (3)

Mohamed Mehdawy

Reputation: 44

First get all dictionary keys as value not reference by using copy method(when dictionary change all_keys not will not be affected)
Declare current_key to access current key
First loop to access all keys
Every time return the new length of keys
Second loop to check if current key equal another key
If condition return true remove this key from dictionary and leave iterator
Every time end second loop increment current key by one

Edit

In the second loop we don't need dict keys every time we can get it once in first loop

Edit 2

When remove key from dict we shouldn't increment current key because the next one will be current so i add flag to control it.

my_dic = {'Key1': ('Value1', '28.302', '30', '131', '10', '321'),
            'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'),
            'Key3': ('Value1', '34.048', '4', '375', '22', '1275'),
            'Key4': ('Value3', '47.561', '82', '159', '901', '1146'),
            'Key5': ('Value1', '35.821', '214', '279', '82', '282')}
# all keys
all_keys = my_dic.copy().keys()
# cuurent key
current_key = 0

for key in all_keys:
    # get new number of keys
    number_of_keys = len(my_dic)
    # new dictionaty keys
    dict_keys = list(my_dic.keys())
    # flag
    flag = True
    for index in range(current_key+1, number_of_keys):

        # if condation return true remove this key
        # from dictionary and leave second loop
        if(my_dic[key][0] == my_dic[dict_keys[index]][0]):
            my_dic.pop(key)
            flag = False
            break
    # when flag return true this mean we should jump to next kay
    # if not we should still becasue next element will be the current
    if flag:
        current_key += 1

you can build function to use this code any time

# filterDictionary
def filterDictionary(user_dict):
    """
        this function return new dictionary after filter it
        params:
            - user_dict: the user dictionary to fitler it
    """
    # get copy from dictionary To not be affected by any change
    user_dict = user_dict.copy()

    # all keys
    all_keys = user_dict.copy().keys()
    
    # cuurent key
    current_key = 0

    for key in all_keys:
        # get new number of keys
        number_of_keys = len(user_dict)

        # new dictionaty keys
        dict_keys = list(user_dict.keys())

        # flag
        flag = True

        for index in range(current_key+1, number_of_keys):
            # if condation return true remove this key
            # from dictionary and leave second loop
            if(user_dict[key][0] == user_dict[dict_keys[index]][0]):
                user_dict.pop(key)
                flag = False
                break
        # when flag return true this mean we should jump to next kay
        # if not we should still becasue next element will be the current
        if flag:
            current_key += 1
        
    
    return user_dict

my_dic = {'Key1': ('Value2', '28.302', '30', '131', '10', '321'),
            'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'),
            'Key3': ('Value3', '34.048', '4', '375', '22', '1275'),
            'Key4': ('Value2', '47.561', '82', '159', '901', '1146'),
            'Key5': ('Value1', '35.821', '214', '279', '82', '282')}

print(filterDictionary(my_dic))

Upvotes: 1

Eelco van Vliet

Reputation: 1238

As an alternative, you could use a pandas data frame approach

import pandas as pd

d = {'Key1': ('Value1', '28.302', '30', '131', '10', '321'),
     'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'),
     'Key3': ('Value1', '34.048', '4', '375', '22', '1275'),
     'Key4': ('Value3', '47.561', '82', '159', '901', '1146'),
     'Key5': ('Value1', '35.821', '214', '279', '82', '282')}

data = pd.DataFrame.from_dict(d, orient='index').reset_index()
data.rename(columns={"index": "Key", 0: "Value"}, inplace=True)
data = data.set_index(["Key", "Value"], drop=True).sort_index(ascending=True)

At this point you have turned your dict into a multiindex dataframe:

                  1    2    3     4     5
Key  Value                               
Key1 Value1  28.302   30  131    10   321
Key2 Value2  42.373   44   98  1252  1413
Key3 Value1  34.048    4  375    22  1275
Key4 Value3  47.561   82  159   901  1146
Key5 Value1  35.821  214  279    82   282

This allows you do to all kind of operations. Finding you wanted rows would be like:

max_rows = list()
sort_column = 1
for key_name, df in data.groupby("Value"):
    max_row = df.sort_values(sort_column, ascending=False).head(1)
    max_rows.append(max_row)
result = pd.concat(max_rows).sort_index()
print(result)

This gives you a Dataframe which looks like this:

                  1    2    3     4     5
Key  Value                               
Key2 Value2  42.373   44   98  1252  1413
Key4 Value3  47.561   82  159   901  1146
Key5 Value1  35.821  214  279    82   282

I you need a dict with the tuples back you can do:

result2 = dict()
for index, row in result.iterrows():
    result2[index[0]] = tuple([index[1]] + row.values.tolist())

giving the desired result:

 {'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'), 
  'Key4': ('Value3', '47.561', '82', '159', '901', '1146'), 
  'Key5': ('Value1', '35.821', '214', '279', '82', '282')}

Most likely the solution by Green Cloak Guy is faster, but having turned dicts into dataframes (multiindex) it is probably easier to manipulate your data

Upvotes: 1

Green Cloak Guy

Reputation: 24691

You'd probably want to do this in a multi-step process.

import itertools

d = {'Key1': ('Value1', '28.302', '30', '131', '10', '321'), 'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'), 'Key3': ('Value1', '34.048', '4', '375', '22', '1275'), 'Key4': ('Value3', '47.561', '82', '159', '901', '1146'), 'Key5': ('Value1', '35.821', '214', '279', '82', '282')}

filtered = dict(
    max(group, key=lambda tup: tup[1])
    for _, group in itertools.groupby(
        sorted(d.items(), key=lambda tup: tup[1]),
        key=lambda tup: tup[1][0]                 
    )
)
# {'Key5': ('Value1', '35.821', '214', '279', '82', '282'), 
#  'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'), 
#  'Key4': ('Value3', '47.561', '82', '159', '901', '1146')}

The process is:

use sorted() to rearrange d.items() so that all identical [first element of tuple] are next to each other (otherwise groupby() won't work)
use itertools.groupby() to collect all items with the same first element of that tuple.
use max() to take the max of each group
convert the list of (key, value) tuples back into a dict

You can insert another sorted() between steps 3 and 4 if you want the keys inserted in a specific order - but it's a dict, so order ought not matter.

Upvotes: 2

Filter dictionary of tuples with duplicated first element of the values based on some condition

Answers (3)

Related Questions