lolix
lolix

Reputation: 1527

Filter max number by id

I have a array of arrays and I want to get the max number by id. In the next example column 2 represent the id and the column 4 the value. When id = 1 the max value is 308.45. When id = 2 the max value is 310.508474.

input:

[['X', '1', '0', '303.016666'],
['X1',  '1', '1', '305.516666'],
['X2',  '1', '2', '308.45'],
['X3',  '2', '0', '309.409836'],
['X4',  '2', '1', '310.508474'],
['X5',  '2', '2', '308.728813']]

output:

[['X2',  '1', '2', '308.45'],
['X4',  '2', '1', '310.508474']]

How can I do that ?

Upvotes: 2

Views: 454

Answers (3)

piRSquared
piRSquared

Reputation: 294488

using pandas

import pandas as pd

df = pd.DataFrame([
        ['X',   1, 0, 303.016666],
        ['X1',  1, 1, 305.516666],
        ['X2',  1, 2, 308.45],
        ['X3',  2, 0, 309.409836],
        ['X4',  2, 1, 310.508474],
        ['X5',  2, 2, 308.728813]]
)

print(df.values[df.groupby(1)[3].idxmax()])

[['X2' 1 2 308.45]
 ['X4' 2 1 310.508474]]

pure numpy

a = np.array([
        ['X',   1, 0, 303.016666],
        ['X1',  1, 1, 305.516666],
        ['X2',  1, 2, 308.45],
        ['X3',  2, 0, 309.409836],
        ['X4',  2, 1, 310.508474],
        ['X5',  2, 2, 308.728813]
    ], dtype=object)

ids = np.unique(a[:, 1])
grp = np.where(ids == a[:, [1]], 1, np.nan)
expanded_value_column = grp * a[:, [3]].astype(float)
max_positions = np.nanargmax(expanded_value_column, axis=0)

print(a[max_positions])

[['X2' 1 2 308.45]
 ['X4' 2 1 310.508474]]

timing
enter image description here

Upvotes: 5

Moinuddin Quadri
Moinuddin Quadri

Reputation: 48090

You can write the dict comprehension expression along with the usage of set() for storing unique id as:

my_data = [
    ['X', '1', '0', '303.016666'],
    ['X1',  '1', '1', '305.516666'],
    ['X2',  '1', '2', '308.45'],
    ['X3',  '2', '0', '309.409836'],
    ['X4',  '2', '1', '310.508474'],
    ['X5',  '2', '2', '308.728813']]

# Unique ids
my_id = set([data[1] for data in my_data])

my_max = {id: max([val for _, i, _, val in my_data if i==id]) for id in my_id}
# Content of 'my_max': {'1': '308.45', '2': '310.508474'}

Upvotes: 0

pt12lol
pt12lol

Reputation: 2441

The simplest and most intuitive solution I can imagine:

>>> l = [['X', '1', '0', '303.016666'],
... ['X1',  '1', '1', '305.516666'],
... ['X2',  '1', '2', '308.45'],
... ['X3',  '2', '0', '309.409836'],
... ['X4',  '2', '1', '310.508474'],
... ['X5',  '2', '2', '308.728813']]
>>> result = {}
>>> for a, b, c, d in l:
...     if b not in result or float(d) > float(result[b][2]):
...         result[b] = (a, c, d)
... 
>>> result
{'1': ('X2', '2', '308.45'), '2': ('X4', '1', '310.508474')}
>>> result = [(a, b, c, d) for b, (a, c, d) in result.items()]
>>> result
[('X2', '1', '2', '308.45'), ('X4', '2', '1', '310.508474')]

Upvotes: 2

Related Questions