Reputation:
This code:
from itertools import groupby, count
L = [38, 98, 110, 111, 112, 120, 121, 898]
groups = groupby(L, key=lambda item, c=count():item-next(c))
tmp = [list(g) for k, g in groups]
Takes [38, 98, 110, 111, 112, 120, 121, 898]
, groups it by consecutive numbers and merge them with this final output:
['38', '98', '110,112', '120,121', '898']
How can the same be done with a list of lists with multiple columns, like this list below where you can group them by name and the consecution of its second column value and then merge.
In other words, this data:
L= [
['Italy','1','3']
['Italy','2','1'],
['Spain','4','2'],
['Spain','5','8'],
['Italy','3','10'],
['Spain','6','4'],
['France','5','3'],
['Spain','20','2']]
should give the following output:
[['Italy','1-2-3','3-1-10'],
['France','5','3'],
['Spain','4-5-6','2-8-4'],
['Spain','20','2']]
Should more-itertools be more appropriate for this task?
Group and combine items of multiple-column lists with itertools/more-itertools in Python
Upvotes: 5
Views: 1747
Reputation: 44545
Here is how one might use more_itertools
, a third-party library of itertools-like recipes.
more_itertools.consecutive_groups
can group consecutive items by some condition.
Given
import collections as ct
import more_itertools as mit
lst = [
['Italy', '1', '3'],
['Italy', '2', '1'],
['Spain', '4', '2'],
['Spain', '5', '8'],
['Italy', '3', '10'],
['Spain', '6', '4'],
['France', '5', '3'],
['Spain', '20', '2']
]
Code
Pre-process data into a dictionary for fast, flexible lookups:
dd = ct.defaultdict(list)
for row in lst:
dd[row[0]].append(row[1:])
dd
Intermediate Output
defaultdict(list,
{'France': [['5', '3']],
'Italy': [['1', '3'], ['2', '1'], ['3', '10']],
'Spain': [['4', '2'], ['5', '8'], ['6', '4'], ['20', '2']]})
Now build whatever output you wish:
result = []
for k, v in dd.items():
cols = [[int(item) for item in col] for col in zip(*v)]
grouped_rows = [list(c) for c in mit.consecutive_groups(zip(*cols), lambda x: x[0])]
grouped_cols = [["-".join(map(str, c)) for c in zip(*grp)] for grp in grouped_rows]
for grp in grouped_cols:
result.append([k, *grp])
result
Final Output
[['Italy', '1-2-3', '3-1-10'],
['Spain', '4-5-6', '2-8-4'],
['Spain', '20', '2'],
['France', '5', '3']]
Details
more_itertools.consecutive_groups
. In return are groups of rows based on your condition (here, it is based on the first column lambda x: x[0]
the dictionary values dd
. This is equivalent to the OP's "second column").Note: resulting order was not specified, but you can sort the output however you wish using sorted()
and a key function. In Python 3.6, insertion order is preserved in the dictionary, creating reproducible dictionaries.
Upvotes: 0
Reputation: 107347
Instead of using itertools.groupby
that requires multiple sorting, checking, etc. Here is an algorithmically optimized approach using dictionaries:
d = {}
flag = False
for country, i, j in L:
temp = 1
try:
item = int(i)
for counter, recs in d[country].items():
temp += 1
last = int(recs[-1][0])
if item in {last - 1, last, last + 1}:
recs.append([i, j])
recs.sort(key=lambda x: int(x[0]))
flag = True
break
if flag:
flag = False
continue
else:
d[country][temp] = [[i, j]]
except KeyError:
d[country] = {}
d[country][1] = [[i, j]]
Demo on a more complex example:
L = [['Italy', '1', '3'],
['Italy', '2', '1'],
['Spain', '4', '2'],
['Spain', '5', '8'],
['Italy', '3', '10'],
['Spain', '6', '4'],
['France', '5', '3'],
['Spain', '20', '2'],
['France', '5', '44'],
['France', '9', '3'],
['Italy', '3', '10'],
['Italy', '5', '17'],
['Italy', '4', '13'],]
{'France': {1: [['5', '3'], ['5', '44']], 2: [['9', '3']]},
'Spain': {1: [['4', '2'], ['5', '8'], ['6', '4']], 2: [['20', '2']]},
'Italy': {1: [['1', '3'], ['2', '1'], ['3', '10'], ['3', '10'], ['4', '13']], 2: [['5', '17']]}}
# You can then produce the results in your intended format as below:
for country, recs in d.items():
for rec in recs.values():
i, j = zip(*rec)
print([country, '-'.join(i), '-'.join(j)])
['France', '5-5', '3-44']
['France', '9', '3']
['Italy', '1-2-3-3-4', '3-1-10-10-13']
['Italy', '5', '17']
['Spain', '4-5-6', '2-8-4']
['Spain', '20', '2']
Upvotes: 1
Reputation: 55489
This is essentially the same grouping technique, but rather than using itertools.count
it uses enumerate
to produce the indices.
First, we sort the data so that all items for a given country are grouped together, and the data is sorted. Then we use groupby
to make a group for each country. Then we use groupby
in the inner loop to group together the consecutive data for each country. Finally, we use zip
& .join
to re-arrange the data into the desired output format.
from itertools import groupby
from operator import itemgetter
lst = [
['Italy','1','3'],
['Italy','2','1'],
['Spain','4','2'],
['Spain','5','8'],
['Italy','3','10'],
['Spain','6','4'],
['France','5','3'],
['Spain','20','2'],
]
newlst = [[country] + ['-'.join(s) for s in zip(*[v[1][1:] for v in g])]
for country, u in groupby(sorted(lst), itemgetter(0))
for _, g in groupby(enumerate(u), lambda t: int(t[1][1]) - t[0])]
for row in newlst:
print(row)
output
['France', '5', '3']
['Italy', '1-2-3', '3-1-10']
['Spain', '20', '2']
['Spain', '4-5-6', '2-8-4']
I admit that lambda
is a bit cryptic; it'd probably better to use a proper def
function instead. I'll add that here in a few minutes.
Here's the same thing using a much more readable key function.
def keyfunc(t):
# Unpack the index and data
i, data = t
# Get the 2nd column from the data, as an integer
val = int(data[1])
# The difference between val & i is constant in a consecutive group
return val - i
newlst = [[country] + ['-'.join(s) for s in zip(*[v[1][1:] for v in g])]
for country, u in groupby(sorted(lst), itemgetter(0))
for _, g in groupby(enumerate(u), keyfunc)]
Upvotes: 0
Reputation: 788
from collections import namedtuple
country = namedtuple('country','name score1 score2')
master_dict = {}
isolated_dict = {}
for val in L:
data = country(*val)
name = data.name
if name in master_dict:
local_data = master_dict[name]
if (int(local_data[1][-1]) + 1) == int(data.score1):
local_data[1] += '-' + data.score1
local_data[2] += '-' + data.score2
else:
if name in isolated_dict:
another_local_data = isolated_dict[name]
another_local_data[1] += '-' + data.score1
another_local_data[2] += '-' + data.score2
else:
isolated_dict[name] = [name,data.score1,data.score2]
else:
master_dict.setdefault(name, [name,data.score1,data.score2])
country_data = list(master_dict.values())+list(isolated_dict.values())
print(country_data)
>>>[['Italy', '1-2-3', '3-1-10'],
['Spain', '4-5-6', '2-8-4'],
['France', '5', '3'],
['Spain', '20', '2']]
Upvotes: 0
Reputation: 251051
You can build up on the same recipe and modify the lambda function to include the first item(country) from each row as well. Secondly, you need to sort the list first based on the last occurrence of the country in the list.
from itertools import groupby, count
L = [
['Italy', '1', '3'],
['Italy', '2', '1'],
['Spain', '4', '2'],
['Spain', '5', '8'],
['Italy', '3', '10'],
['Spain', '6', '4'],
['France', '5', '3'],
['Spain', '20', '2']]
indices = {row[0]: i for i, row in enumerate(L)}
sorted_l = sorted(L, key=lambda row: indices[row[0]])
groups = groupby(
sorted_l,
lambda item, c=count(): [item[0], int(item[1]) - next(c)]
)
for k, g in groups:
print [k[0]] + ['-'.join(x) for x in zip(*(x[1:] for x in g))]
['Italy', '1-2-3', '3-1-10']
['France', '5', '3']
['Spain', '4-5-6', '2-8-4']
['Spain', '20', '2']
Upvotes: 3