Alex M
Alex M

Reputation: 187

Python - extract min/max value from list of tuples

I have a list of tuples as follows:

data = [
    ('A', '59', '62'), ('A', '2', '6'), ('A', '87', '92'),
    ('A', '98', '104'), ('A', '111', '117'),
    ('B', '66', '71'), ('B', '25', '31'), ('B', '34', '40'), ('B', '46', '53'),
    ('B', '245', '251'), ('B', '235', '239'), ('B', '224', '229'), ('B', '135', '140'),
    ('C', '157', '162'), ('C', '203', '208'),
    ('D', '166', '173'), ('D', '176', '183'),
    ('E', '59', '62'), ('E', '2', '6'), ('E', '87', '92'), ('E', '98', '104'), ('E', '111', '117')
]

They correspond to a subset of a bigger data-set, so I extracted as above to simplify this post. The first element of each tuple i.e. A, B, C, D, E... is an identifier and can be present in multiple copies.

I would like to extract for each ID/category (A,B,C,D,E...):

1 - minimum from the 2nd element of the tuple

2 - maximum from the 3rd element of the tuple

The final output list should look like:

A: min = 2, max = 117
B: min = 25, max = 251
C: min = 157, max = 208
D: min = 166, max = 183
E: min = 2, max = 117

I tried an approach based on this post: How to remove duplicate from list of tuple when order is important

I simplified for testing by using tuples with only the first 2 elements and extracting the minimum only.

The output looks like this:

('A', '111')
('B', '135')
('C', '157')
('D', '166')
('E', '111')

It should be:

('A', '2')
('B', '25')
('C', '157')
('D', '166')
('E', '2')

I'm looking for an approach that would work with the complete "triple tuple" example, so as to avoid splitting data into multiple subsets.

Many thanks for your time.

EDIT 1 - 2018-10-31

Hello,

please see my edit below that includes the code snippet not included earlier. This gives the erroneous minimum values in the preceding part of the post.

data_min_only = [('A', '59'), ('A', '2'), ('A', '87'), ('A', '98'), ('A', '111'), ('B', '66'), ('B', '25'), ('B', '34'), ('B', '46'), ('B', '245'), ('B', '235'), ('B', '224'), ('B', '135'), ('C', '157'), ('C', '203'), ('D', '166'), ('D', '176'), ('E', '59'), ('E', '2'), ('E', '87'), ('E', '98'), ('E', '111')]

from collections import OrderedDict

empty_dict = OrderedDict()

for item in data_min_only:

    # Get old value in dictionary if exist
    old = empty_dict.get(item[0])

    # Skip if new item is larger than old
    if old:
        if item[1] > old[1]:
            continue
        else:
            del d[item[0]]

    # Assign
    empty_dict[item[0]] = item

list(empty_dict.values())

I was thinking that the order of the tuple values for each category was the problem (should be smallest to largest prior to iterating through data_min_only.

Thank you to all posters for their prompt responses and suggestions/solutions! I'm currently working through those to try and understand and adapt them further.

EDIT 2 - 2018-10-31

I tweaked @slider suggestion to retrieve the differences between min and max. I also tried to output that result to a list as below, but only the last result appears.

for k, g in groupby(sorted(data), key=lambda x: x[0]):
    vals = [(int(t[1]), int(t[2])) for t in g]
    print (max(i[1] for i in vals) - min(i[0] for i in vals))
    test_lst = []
    test_lst.append((max(i[1] for i in vals) - min(i[0] for i in vals)))

I also tried this but got the same result:

for i in vals:
    test_lst2 = []
    test_lst2.append((max(i[1] for i in vals) - min(i[0] for i in vals)))

For this kind of loop, what is the best way to extract the results to a list?

Thanks again.

EDIT 3 - 2018-10-31

test_lst = []
for k, g in groupby(sorted(data), key=lambda x: x[0]):
    vals = [(int(t[1]), int(t[2])) for t in g]
    print (max(i[1] for i in vals) - min(i[0] for i in vals))
    test_lst.append((max(i[1] for i in vals) - min(i[0] for i in vals)))

Solution to extracting loop data - empty list should be outside the loop. Please see @slider comments for his post below.

Upvotes: 2

Views: 5712

Answers (4)

evantkchong
evantkchong

Reputation: 2606

This an another approach that will work using the Pandas library:

import pandas as pd

#The same dataset you provided us
data = [('A', '59', '62'), ('A', '2', '6'), ('A', '87', '92'), ('A', '98', '104'), ('A', '111', '117'), ('B', '66', '71'), ('B', '25', '31'), ('B', '34', '40'), ('B', '46', '53'), ('B', '245', '251'), ('B', '235', '239'), ('B', '224', '229'), ('B', '135', '140'), ('C', '157', '162'), ('C', '203', '208'), ('D', '166', '173'), ('D', '176', '183'), ('E', '59', '62'), ('E', '2', '6'), ('E', '87', '92'), ('E', '98', '104'), ('E', '111', '117')]

#Generate dataframe df
df = pd.DataFrame(data=data)
#Convert strings to their respective numerical values
df[[1,2]] = df[[1,2]].apply(pd.to_numeric, errors='ignore')

#Group values using column 0
df.groupby(0).agg({1: min, 2: max})

We use the agg method with a dictionary as the argument in order to find the minimum in column 1 and the maximum in column 2 for each grouped range.

This gives the following result:

     1    2
0
A    2  117
B   25  251
C  157  208
D  166  183
E    2  117

Upvotes: 1

slider
slider

Reputation: 12990

You can use itertools.groupby to first group by the "id" key, and then compute the min and max for each group:

from itertools import groupby

groups = []
for k, g in groupby(sorted(data), key=lambda x: x[0]):
    groups.append(list(g))

for g in groups:
    print(g[0][0], 'min:', min(int(i[1]) for i in g), 'max:', max(int(i[2]) for i in g))

Output

A min: 2 max: 117
B min: 25 max: 251
C min: 157 max: 208
D min: 166 max: 183
E min: 2 max: 117

Note that you don't have to store the groups first in the groups list; you can directly print the min and max as you're iterating in the groupby for loop:

for k, g in groupby(sorted(data), key=lambda x: x[0]):
    vals = [(int(t[1]), int(t[2])) for t in g]
    print(k, 'min:', min(i[0] for i in vals), 'max:', max(i[1] for i in vals))

Upvotes: 5

Gahan
Gahan

Reputation: 4213

data = [('A', '59', '62'), ('A', '2', '6'), ('A', '87', '92'), ('A', '98', '104'), ('A', '111', '117'), ('B', '66', '71'), ('B', '25', '31'), ('B', '34', '40'), ('B', '46', '53'), ('B', '245', '251'), ('B', '235', '239'), ('B', '224', '229'), ('B', '135', '140'), ('C', '157', '162'), ('C', '203', '208'), ('D', '166', '173'), ('D', '176', '183'), ('E', '59', '62'), ('E', '2', '6'), ('E', '87', '92'), ('E', '98', '104'), ('E', '111', '117')]


result = {}  # construct result dictionary
for i in data:
    cur_min, cur_max = map(int, i[1:])
    min_i, max_i = result.setdefault(i[0], [cur_min, cur_max])
    if cur_min < min_i:
        result[i[0]][0] = cur_min
    if cur_max > max_i:
        result[i[0]][1] = cur_max
# print(result)  # dictionary containing keys with list of min and max values for given key >>> {'A': [2, 117], 'B': [25, 251], 'C': [157, 208], 'D': [166, 183], 'E': [2, 117]}

for k, v in result.items():  # loop to print output
    print("{} min: {} max: {}".format(k, v[0], v[1]))

Output:

A min: 2 max: 117
B min: 25 max: 251
C min: 157 max: 208
D min: 166 max: 183
E min: 2 max: 117

Upvotes: 2

Maria Nazari
Maria Nazari

Reputation: 690

Another approach:

max_list = {}
min_list = {}
for i in data:
    if i[0] not in max_list:
        max_list[i[0]] = -99999
        min_list[i[0]] = 99999

    if max_list[i[0]] < int(i[2]):
        max_list[i[0]] = int(i[2])

    if min_list[i[0]] > int(i[1]):
        min_list[i[0]] = int(i[1])



for ele in max_list:
    print(ele, ' min: ', min_list[ele], 'max: ', max_list[ele])

Upvotes: 2

Related Questions