Reputation: 39
I currently have a list of locations that I would like to sort out.
The list looks like the following:
list = [['Location 1', 5],['Location 2', 5],['Location 3', 5],['Location 1', 4],['Location 2', 6],['Location 3', 5],['Location 1', 5],['Location 2', 5]]
The goal is to select the highest value of each list in index 1 for every location. The final results should look like the following:
correctList = [['Location 1', 5],['Location 2', 6],['Location 3', 5]]
Locations with the same integer value has no preference.
The solution that I have now is appending each location to there own list based on name. Then from each list using a max()
operation on each location list.
Upvotes: 1
Views: 59
Reputation: 88305
You can use itertools.groupby
to select the list with the max
second element, once the lists have been sorted using the first element:
s = sorted(l, key=lambda x: x[0])
[max(k) for i,k in groupby(s, key=lambda x: x[0])]
[['Location 1', 5], ['Location 2', 6], ['Location 3', 5]]
Where:
sorted(l, key=lambda x: x[0])
[['Location 1', 5],
['Location 1', 4],
['Location 1', 5],
['Location 2', 5],
['Location 2', 6],
['Location 2', 5],
['Location 3', 5],
['Location 3', 5]]
Note that max
will give the desired output when fed a set of lists as:
max(['Location 1', 5], ['Location 1', 4], ['Location 1', 5])
#['Location 1', 5]
Upvotes: 1
Reputation: 7206
You can use pandas
for this, it is very easy to group by one key and calculate something for each group:
import pandas as pd
df = pd.DataFrame([['Location 1', 5],['Location 2', 5],['Location 3', 5],['Location 1', 4],['Location 2', 6],['Location 3', 5],['Location 1', 5],['Location 2', 5]],
columns=["location", "value"])
df.groupby("location").max()
# value
# location
# Location 1 5
# Location 2 6
# Location 3 5
If you absolutely need a list of lists afterwards, that is also possible:
df.groupby("location").max().reset_index().values.tolist()
# [['Location 1', 5], ['Location 2', 6], ['Location 3', 5]]
Note that if this is the only thing you want to do with this data, this is probably overkill. But if you need to do some more analysis with it, getting used to pandas
can speed up a lot of things, since most of its methods are vectorized and written in C.
Upvotes: 0
Reputation: 164833
You can use collections.defaultdict
for an O(n) solution:
from collections import defaultdict
L = [['Location 1', 5],['Location 2', 5],['Location 3', 5],['Location 1', 4],
['Location 2', 6],['Location 3', 5],['Location 1', 5],['Location 2', 5]]
dd = defaultdict(int)
for location, value in L:
dd[location] = max(dd[location], value)
print(dd)
# defaultdict(int, {'Location 1': 5, 'Location 2': 6, 'Location 3': 5})
This gives a dictionary mapping. If you are keen on a list of lists:
res = list(map(list, dd.items()))
print(res)
# [['Location 1', 5], ['Location 2', 6], ['Location 3', 5]]
Upvotes: 1
Reputation: 61930
You could use a dictionary to compute the maximum value per location in O(n):
data = [['Location 1', 5], ['Location 2', 5], ['Location 3', 5], ['Location 1', 4], ['Location 2', 6],
['Location 3', 5], ['Location 1', 5], ['Location 2', 5]]
groups = {}
for location, value in data:
if location not in groups:
groups[location] = value
else:
groups[location] = max(groups[location], value)
result = [[location, value] for location, value in groups.items()]
print(result)
Output
[['Location 1', 5], ['Location 2', 6], ['Location 3', 5]]
Upvotes: 0