Fribbe
Fribbe

Reputation: 47

Find every index of max value in a column in two-dimensional list

So I made this function to find what the max value of "column" is in my 2d-list, which returns the max value of my said column in my list, but this said max value actually appears twice, so how do I return the indexes/rows where this max value appears?

Cheers

def find_max(column):
    maxVal = 0
    for i in range(1, len(lst)):
        maxVal = max(maxVal, int(lst[i][column]))
    return (maxVal) 

I feel so lost, but i've been trying something like this.... v (obv not working atm, just brainstorming)


def test(column):
    maxVal = 0
    year = []
    for i in range(1, len(lst)):
        if maxVal == int(lst[i][column]):
            year.append(lst[i][0])
        else:
            maxVal = max(maxVal, int(lst[i][column]))
            year = (lst[i][0])
        year.extend(maxVal)
    return year

#so column 0 is years, and I want to save the years where my X column had the biggest value(s). 

Edit: My list looks like this

And lets say the column I'm looking for is the third, so I have the max value of 27 on row 36 & 38, how do I return these indexes? (What im actually looking for is what the value on first column is, 2004 & 2006)

Upvotes: 2

Views: 791

Answers (3)

Gardener
Gardener

Reputation: 2660

To return the column index, the max value, and the years, I have returned a tuple for the final output. See the output printout at the bottom.

I have created sample data and then have created a tuple for the output. The tuple can be modified to a different type of output very easily. Note, that the output columns skip the first column of the array as that is the year, and no max year is needed. Also, the penultimate column has blank data, so extra logic was added to handle blanks. The code should handle blanks in any column, even though they usually only occur in one. The data_colunns_less_2 value can be modified to increase the number of columns.

As with most engineering problems, the first step is to state the problem clearly. By clearly stating the problem, it sometimes becomes trivial to solve:

Given an array containing rows and columns stored as a list of rows where each row contains an array of strings where the first column is a year and the remaining columns are data, and Some columns contain blanks

return an output list of tuples corresponding to each of the data columns.
So, if the original array has n columns, the output list will have n-1 columns since the year column is not needed.

Further, tuples shall consist of a column index -- to the original array, the max value for the column, and a list of the years containing the max value.

import random  # to create test list

def generate_data():
    # create sample list
    # random.seed(365)
    # l = [random.choice(['s1', 's2', 's3', 's4']) for _ in range(20)]

    data = []
    data_colunns_less_2 = 8
    for year in range(2000, 2006):
        row = [str(random.randint(0,10)) for _ in range(data_colunns_less_2)]
        row.insert(0, str(year))
        row.append('')
        row.append(str(random.randint(-10,10)))
        data.append(row)
    return data

def print_data(data):
    for row in data: print(row)

def check_int(s): # from https://stackoverflow.com/a/1265696/4983398
    # I like to avoid exceptions
    if len(s) > 0 and s[0] in ('-', '+'):
        return s[1:].isdigit()
    return s.isdigit()

def get_max_tuple_for_column(data, col_index):
    max = data[0][col_index]
    if check_int(max):
        max_is_non_digit = False
        max = int(max)
    else:
        max_is_non_digit = True

    indices_of_max = []
    years = []
    for row in data:
        test_val = row[col_index]
        if not check_int(test_val):
            if max_is_non_digit:
                years.append(row[0])
            continue
        else:
            val = int(test_val)

        if max_is_non_digit:
            max = val
            max_is_non_digit = False
            years = [int(row[0])]
        elif val > max:
            max = val
            years = [int(row[0])]  # (re)start a list of years with this max value
        elif val == max:
            years.append(int(row[0])) # save an extra year for this column
    return (col_index, max, years)

if __name__ == '__main__':

    data = generate_data()
    out_list = [get_max_tuple_for_column(data, col) for col in range(1,len(data[0]))]
    print("Generated Random Dataset:")
    for row in data: print(row)
    print("Output: (col_index, max_value ")
    for row in out_list: print(row)

Output:

Generated Random Dataset:
['2000', '1', '2', '2', '9', '1', '9', '4', '8', '', '-9']
['2001', '9', '2', '9', '10', '6', '3', '10', '2', '', '0']
['2002', '4', '2', '2', '1', '4', '2', '9', '7', '', '-1']
['2003', '8', '4', '0', '9', '4', '10', '6', '4', '', '10']
['2004', '7', '10', '6', '5', '2', '1', '6', '1', '', '3']
['2005', '1', '4', '5', '8', '1', '2', '5', '2', '', '5']
Output:(col_index, max_value for column, [year1, year2, ..]
(1, 9, [2001])
(2, 10, [2004])
(3, 9, [2001])
(4, 10, [2001])
(5, 6, [2001])
(6, 10, [2003])
(7, 10, [2001])
(8, 8, [2000])
(9, '', ['2000', '2001', '2002', '2003', '2004', '2005'])
(10, 10, [2003])

Upvotes: 0

Tsubasa
Tsubasa

Reputation: 1429

Try this one.

data = [
    ['foo', 'bar', 'foo', 'bruh', 'test', 'foo', 'bar', 'bar'],
    [2001, 86, 26, 163, 9, 8, 214, 8],
    [2002, 91, 27, 174, 9, 9, 201, 8]
]

def get_max(data):
    """
        Arg     :   `data` -> Type: List
        Returns : `result` -> Type: List
    """

    max = 0
    result = []
    
    for l in data[1:]:
        if int(l[2]) > max: 
            max = int(l[2])
            result.clear()
            # result.append(l)
            # if you need the index of where the l appears
            result.append(data.index(l) + 1)

            
        elif int(l[2]) == max:
            # result.append(l)
            result.append(data.index(l) + 1)
            
    return result
    
print(get_max(data))

Upvotes: 1

xiaoqiao
xiaoqiao

Reputation: 9

import numpy as np

b = np.array([
        [1, 2, 0],
        [1, 3, 9]
    ])

index = np.unravel_index(b.argmax(), b.shape)   # max num index

print(index)

(1, 2) # start by zero, 1 row 2 column

Upvotes: 0

Related Questions