Reputation: 39

Return lists that have the highest value per group

I currently have a list of locations that I would like to sort out.

The list looks like the following:

list = [['Location 1', 5],['Location 2', 5],['Location 3', 5],['Location 1', 4],['Location 2', 6],['Location 3', 5],['Location 1', 5],['Location 2', 5]]

The goal is to select the highest value of each list in index 1 for every location. The final results should look like the following:

correctList = [['Location 1', 5],['Location 2', 6],['Location 3', 5]]

Locations with the same integer value has no preference.

The solution that I have now is appending each location to there own list based on name. Then from each list using a max() operation on each location list.

Upvotes: 1

Answers (4)

yatu

Reputation: 88305

You can use itertools.groupby to select the list with the max second element, once the lists have been sorted using the first element:

s = sorted(l, key=lambda x: x[0])
[max(k) for i,k in groupby(s, key=lambda x: x[0])]
[['Location 1', 5], ['Location 2', 6], ['Location 3', 5]]

Where:

sorted(l, key=lambda x: x[0])

[['Location 1', 5],
 ['Location 1', 4],
 ['Location 1', 5],
 ['Location 2', 5],
 ['Location 2', 6],
 ['Location 2', 5],
 ['Location 3', 5],
 ['Location 3', 5]]

Note that max will give the desired output when fed a set of lists as:

max(['Location 1', 5], ['Location 1', 4], ['Location 1', 5])
#['Location 1', 5]

Upvotes: 1

Graipher

Reputation: 7206

You can use pandas for this, it is very easy to group by one key and calculate something for each group:

import pandas as pd

df = pd.DataFrame([['Location 1', 5],['Location 2', 5],['Location 3', 5],['Location 1', 4],['Location 2', 6],['Location 3', 5],['Location 1', 5],['Location 2', 5]],
                  columns=["location", "value"])
df.groupby("location").max()
#             value
# location         
# Location 1      5
# Location 2      6
# Location 3      5

If you absolutely need a list of lists afterwards, that is also possible:

df.groupby("location").max().reset_index().values.tolist()
# [['Location 1', 5], ['Location 2', 6], ['Location 3', 5]]

Note that if this is the only thing you want to do with this data, this is probably overkill. But if you need to do some more analysis with it, getting used to pandas can speed up a lot of things, since most of its methods are vectorized and written in C.

Upvotes: 0

jpp

Reputation: 164833

You can use collections.defaultdict for an O(n) solution:

from collections import defaultdict

L = [['Location 1', 5],['Location 2', 5],['Location 3', 5],['Location 1', 4],
     ['Location 2', 6],['Location 3', 5],['Location 1', 5],['Location 2', 5]]

dd = defaultdict(int)

for location, value in L:
    dd[location] = max(dd[location], value)

print(dd)
# defaultdict(int, {'Location 1': 5, 'Location 2': 6, 'Location 3': 5})

This gives a dictionary mapping. If you are keen on a list of lists:

res = list(map(list, dd.items()))

print(res)
# [['Location 1', 5], ['Location 2', 6], ['Location 3', 5]]

Upvotes: 1

Dani Mesejo

Reputation: 61930

You could use a dictionary to compute the maximum value per location in O(n):

data = [['Location 1', 5], ['Location 2', 5], ['Location 3', 5], ['Location 1', 4], ['Location 2', 6],
        ['Location 3', 5], ['Location 1', 5], ['Location 2', 5]]

groups = {}
for location, value in data:
    if location not in groups:
        groups[location] = value
    else:
        groups[location] = max(groups[location], value)

result = [[location, value] for location, value in groups.items()]

print(result)

Output

[['Location 1', 5], ['Location 2', 6], ['Location 3', 5]]

Upvotes: 0

Return lists that have the highest value per group

Answers (4)

Related Questions