Reputation: 47
So I made this function to find what the max value of "column" is in my 2d-list, which returns the max value of my said column in my list, but this said max value actually appears twice, so how do I return the indexes/rows where this max value appears?
Cheers
def find_max(column):
maxVal = 0
for i in range(1, len(lst)):
maxVal = max(maxVal, int(lst[i][column]))
return (maxVal)
I feel so lost, but i've been trying something like this.... v (obv not working atm, just brainstorming)
def test(column):
maxVal = 0
year = []
for i in range(1, len(lst)):
if maxVal == int(lst[i][column]):
year.append(lst[i][0])
else:
maxVal = max(maxVal, int(lst[i][column]))
year = (lst[i][0])
year.extend(maxVal)
return year
#so column 0 is years, and I want to save the years where my X column had the biggest value(s).
And lets say the column I'm looking for is the third, so I have the max value of 27 on row 36 & 38, how do I return these indexes? (What im actually looking for is what the value on first column is, 2004 & 2006)
Upvotes: 2
Views: 791
Reputation: 2660
To return the column index, the max value, and the years, I have returned a tuple for the final output. See the output printout at the bottom.
I have created sample data and then have created a tuple for the output. The tuple can be modified to a different type of output very easily. Note, that the output columns skip the first column of the array as that is the year, and no max year is needed. Also, the penultimate column has blank data, so extra logic was added to handle blanks. The code should handle blanks in any column, even though they usually only occur in one. The data_colunns_less_2
value can be modified to increase the number of columns.
As with most engineering problems, the first step is to state the problem clearly. By clearly stating the problem, it sometimes becomes trivial to solve:
Given an array containing rows and columns stored as a list of rows where each row contains an array of strings where the first column is a year and the remaining columns are data, and Some columns contain blanks
return an output list of tuples corresponding to each of the data columns.
So, if the original array has n columns, the output list will have n-1 columns since the year column is not needed.
Further, tuples shall consist of a column index -- to the original array, the max value for the column, and a list of the years containing the max value.
import random # to create test list
def generate_data():
# create sample list
# random.seed(365)
# l = [random.choice(['s1', 's2', 's3', 's4']) for _ in range(20)]
data = []
data_colunns_less_2 = 8
for year in range(2000, 2006):
row = [str(random.randint(0,10)) for _ in range(data_colunns_less_2)]
row.insert(0, str(year))
row.append('')
row.append(str(random.randint(-10,10)))
data.append(row)
return data
def print_data(data):
for row in data: print(row)
def check_int(s): # from https://stackoverflow.com/a/1265696/4983398
# I like to avoid exceptions
if len(s) > 0 and s[0] in ('-', '+'):
return s[1:].isdigit()
return s.isdigit()
def get_max_tuple_for_column(data, col_index):
max = data[0][col_index]
if check_int(max):
max_is_non_digit = False
max = int(max)
else:
max_is_non_digit = True
indices_of_max = []
years = []
for row in data:
test_val = row[col_index]
if not check_int(test_val):
if max_is_non_digit:
years.append(row[0])
continue
else:
val = int(test_val)
if max_is_non_digit:
max = val
max_is_non_digit = False
years = [int(row[0])]
elif val > max:
max = val
years = [int(row[0])] # (re)start a list of years with this max value
elif val == max:
years.append(int(row[0])) # save an extra year for this column
return (col_index, max, years)
if __name__ == '__main__':
data = generate_data()
out_list = [get_max_tuple_for_column(data, col) for col in range(1,len(data[0]))]
print("Generated Random Dataset:")
for row in data: print(row)
print("Output: (col_index, max_value ")
for row in out_list: print(row)
Output:
Generated Random Dataset:
['2000', '1', '2', '2', '9', '1', '9', '4', '8', '', '-9']
['2001', '9', '2', '9', '10', '6', '3', '10', '2', '', '0']
['2002', '4', '2', '2', '1', '4', '2', '9', '7', '', '-1']
['2003', '8', '4', '0', '9', '4', '10', '6', '4', '', '10']
['2004', '7', '10', '6', '5', '2', '1', '6', '1', '', '3']
['2005', '1', '4', '5', '8', '1', '2', '5', '2', '', '5']
Output:(col_index, max_value for column, [year1, year2, ..]
(1, 9, [2001])
(2, 10, [2004])
(3, 9, [2001])
(4, 10, [2001])
(5, 6, [2001])
(6, 10, [2003])
(7, 10, [2001])
(8, 8, [2000])
(9, '', ['2000', '2001', '2002', '2003', '2004', '2005'])
(10, 10, [2003])
Upvotes: 0
Reputation: 1429
Try this one.
data = [
['foo', 'bar', 'foo', 'bruh', 'test', 'foo', 'bar', 'bar'],
[2001, 86, 26, 163, 9, 8, 214, 8],
[2002, 91, 27, 174, 9, 9, 201, 8]
]
def get_max(data):
"""
Arg : `data` -> Type: List
Returns : `result` -> Type: List
"""
max = 0
result = []
for l in data[1:]:
if int(l[2]) > max:
max = int(l[2])
result.clear()
# result.append(l)
# if you need the index of where the l appears
result.append(data.index(l) + 1)
elif int(l[2]) == max:
# result.append(l)
result.append(data.index(l) + 1)
return result
print(get_max(data))
Upvotes: 1
Reputation: 9
import numpy as np
b = np.array([
[1, 2, 0],
[1, 3, 9]
])
index = np.unravel_index(b.argmax(), b.shape) # max num index
print(index)
(1, 2) # start by zero, 1 row 2 column
Upvotes: 0