Reputation: 55
Sample input data in which Value1 is duplicated across several keys
{'Key1': ('Value1', '28.302', '30', '131', '10', '321'),
'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'),
'Key3': ('Value1', '34.048', '4', '375', '22', '1275'),
'Key4': ('Value3', '47.561', '82', '159', '901', '1146'),
'Key5': ('Value1', '35.821', '214', '279', '82', '282')}
Desired results in which key1 and key3 have been filtered out because the second element of Value1 is the highest in Key5.
{'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'),
'Key4': ('Value3', '47.561', '82', '159', '901', '1146'),
'Key5': ('Value1', '35.821', '214', '279', '82', '282')}
My attempts thus far have failed and they are probably useless to post them here!
Upvotes: 0
Views: 140
Reputation: 44
Edit
In the second loop we don't need dict keys every time we can get it once in first loop
Edit 2
When remove key from dict we shouldn't increment current key because the next one will be current so i add flag to control it.
my_dic = {'Key1': ('Value1', '28.302', '30', '131', '10', '321'),
'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'),
'Key3': ('Value1', '34.048', '4', '375', '22', '1275'),
'Key4': ('Value3', '47.561', '82', '159', '901', '1146'),
'Key5': ('Value1', '35.821', '214', '279', '82', '282')}
# all keys
all_keys = my_dic.copy().keys()
# cuurent key
current_key = 0
for key in all_keys:
# get new number of keys
number_of_keys = len(my_dic)
# new dictionaty keys
dict_keys = list(my_dic.keys())
# flag
flag = True
for index in range(current_key+1, number_of_keys):
# if condation return true remove this key
# from dictionary and leave second loop
if(my_dic[key][0] == my_dic[dict_keys[index]][0]):
my_dic.pop(key)
flag = False
break
# when flag return true this mean we should jump to next kay
# if not we should still becasue next element will be the current
if flag:
current_key += 1
you can build function to use this code any time
# filterDictionary
def filterDictionary(user_dict):
"""
this function return new dictionary after filter it
params:
- user_dict: the user dictionary to fitler it
"""
# get copy from dictionary To not be affected by any change
user_dict = user_dict.copy()
# all keys
all_keys = user_dict.copy().keys()
# cuurent key
current_key = 0
for key in all_keys:
# get new number of keys
number_of_keys = len(user_dict)
# new dictionaty keys
dict_keys = list(user_dict.keys())
# flag
flag = True
for index in range(current_key+1, number_of_keys):
# if condation return true remove this key
# from dictionary and leave second loop
if(user_dict[key][0] == user_dict[dict_keys[index]][0]):
user_dict.pop(key)
flag = False
break
# when flag return true this mean we should jump to next kay
# if not we should still becasue next element will be the current
if flag:
current_key += 1
return user_dict
my_dic = {'Key1': ('Value2', '28.302', '30', '131', '10', '321'),
'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'),
'Key3': ('Value3', '34.048', '4', '375', '22', '1275'),
'Key4': ('Value2', '47.561', '82', '159', '901', '1146'),
'Key5': ('Value1', '35.821', '214', '279', '82', '282')}
print(filterDictionary(my_dic))
Upvotes: 1
Reputation: 1238
As an alternative, you could use a pandas data frame approach
import pandas as pd
d = {'Key1': ('Value1', '28.302', '30', '131', '10', '321'),
'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'),
'Key3': ('Value1', '34.048', '4', '375', '22', '1275'),
'Key4': ('Value3', '47.561', '82', '159', '901', '1146'),
'Key5': ('Value1', '35.821', '214', '279', '82', '282')}
data = pd.DataFrame.from_dict(d, orient='index').reset_index()
data.rename(columns={"index": "Key", 0: "Value"}, inplace=True)
data = data.set_index(["Key", "Value"], drop=True).sort_index(ascending=True)
At this point you have turned your dict into a multiindex dataframe:
1 2 3 4 5
Key Value
Key1 Value1 28.302 30 131 10 321
Key2 Value2 42.373 44 98 1252 1413
Key3 Value1 34.048 4 375 22 1275
Key4 Value3 47.561 82 159 901 1146
Key5 Value1 35.821 214 279 82 282
This allows you do to all kind of operations. Finding you wanted rows would be like:
max_rows = list()
sort_column = 1
for key_name, df in data.groupby("Value"):
max_row = df.sort_values(sort_column, ascending=False).head(1)
max_rows.append(max_row)
result = pd.concat(max_rows).sort_index()
print(result)
This gives you a Dataframe which looks like this:
1 2 3 4 5
Key Value
Key2 Value2 42.373 44 98 1252 1413
Key4 Value3 47.561 82 159 901 1146
Key5 Value1 35.821 214 279 82 282
I you need a dict with the tuples back you can do:
result2 = dict()
for index, row in result.iterrows():
result2[index[0]] = tuple([index[1]] + row.values.tolist())
giving the desired result:
{'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'),
'Key4': ('Value3', '47.561', '82', '159', '901', '1146'),
'Key5': ('Value1', '35.821', '214', '279', '82', '282')}
Most likely the solution by Green Cloak Guy is faster, but having turned dicts into dataframes (multiindex) it is probably easier to manipulate your data
Upvotes: 1
Reputation: 24691
You'd probably want to do this in a multi-step process.
import itertools
d = {'Key1': ('Value1', '28.302', '30', '131', '10', '321'), 'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'), 'Key3': ('Value1', '34.048', '4', '375', '22', '1275'), 'Key4': ('Value3', '47.561', '82', '159', '901', '1146'), 'Key5': ('Value1', '35.821', '214', '279', '82', '282')}
filtered = dict(
max(group, key=lambda tup: tup[1])
for _, group in itertools.groupby(
sorted(d.items(), key=lambda tup: tup[1]),
key=lambda tup: tup[1][0]
)
)
# {'Key5': ('Value1', '35.821', '214', '279', '82', '282'),
# 'Key2': ('Value2', '42.373', '44', '98', '1252', '1413'),
# 'Key4': ('Value3', '47.561', '82', '159', '901', '1146')}
The process is:
sorted()
to rearrange d.items()
so that all identical [first element of tuple] are next to each other (otherwise groupby()
won't work)itertools.groupby()
to collect all items with the same first element of that tuple.max()
to take the max of each group(key, value)
tuples back into a dictYou can insert another sorted()
between steps 3 and 4 if you want the keys inserted in a specific order - but it's a dict, so order ought not matter.
Upvotes: 2