Crazie Ash
Crazie Ash

Reputation: 21

Group into sublists with common values of a column from list of list

List element format: (x0, y0, x1, y1, "word", block_no, line_no, word_no)

given = [
(518.1566162109375, 381.6667175292969, 537.3801879882812, 391.70867919921875, 'cost', 19, 0, 11), 
(542.1559448242188, 381.6667175292969, 556.5796508789062, 391.70867919921875, 'and', 19, 0, 12), 
(81.36001586914062, 390.6634826660156, 124.58306121826172, 400.7054443359375, 'inventory', 19, 1, 0), 
(129.35882568359375, 390.6634826660156, 167.78199768066406, 400.7054443359375, 'control,', 19, 1, 1)
]

I need to group by "y1" with the same values and form as given below:

required = [
[
(518.1566162109375, 381.6667175292969, 537.3801879882812, 391.70867919921875, 'cost', 19, 0, 11), 
(542.1559448242188, 381.6667175292969, 556.5796508789062, 391.70867919921875, 'and', 19, 0, 12)
], 
[
(81.36001586914062, 390.6634826660156, 124.58306121826172, 400.7054443359375, 'inventory', 19, 1, 0), 
(129.35882568359375, 390.6634826660156, 167.78199768066406, 400.7054443359375, 'control,', 19, 1, 1)
]
]

Please suggest me some best way to achieve it.

Upvotes: 0

Views: 58

Answers (2)

Pygirl
Pygirl

Reputation: 13339

using itertools:

import itertools
byloc = lambda x: x[3]
new_list = [list(v) for k,v in itertools.groupby(given, key=byloc)]
new_list

[[(518.1566162109375,
   381.6667175292969,
   537.3801879882812,
   391.70867919921875,
   'cost',
   19,
   0,
   11),
  (542.1559448242188,
   381.6667175292969,
   556.5796508789062,
   391.70867919921875,
   'and',
   19,
   0,
   12)],
 [(81.36001586914062,
   390.6634826660156,
   124.58306121826172,
   400.7054443359375,
   'inventory',
   19,
   1,
   0),
  (129.35882568359375,
   390.6634826660156,
   167.78199768066406,
   400.7054443359375,
   'control,',
   19,
   1,
   1)]]

Upvotes: 0

rdas
rdas

Reputation: 21275

With itertools.groupby & operator.itemgettter:

from itertools import groupby
from operator import itemgetter

given = [
(518.1566162109375, 381.6667175292969, 537.3801879882812, 391.70867919921875, 'cost', 19, 0, 11), 
(542.1559448242188, 381.6667175292969, 556.5796508789062, 391.70867919921875, 'and', 19, 0, 12), 
(81.36001586914062, 390.6634826660156, 124.58306121826172, 400.7054443359375, 'inventory', 19, 1, 0), 
(129.35882568359375, 390.6634826660156, 167.78199768066406, 400.7054443359375, 'control,', 19, 1, 1)
]

grouped_by_y1 = [list(g) for _, g in groupby(given, key=itemgetter(3))]

print(grouped_by_y1)

Output:

[
[(518.1566162109375, 381.6667175292969, 537.3801879882812, 391.70867919921875, 'cost', 19, 0, 11), (542.1559448242188, 381.6667175292969, 556.5796508789062, 391.70867919921875, 'and', 19, 0, 12)],
[(81.36001586914062, 390.6634826660156, 124.58306121826172, 400.7054443359375, 'inventory', 19, 1, 0), (129.35882568359375, 390.6634826660156, 167.78199768066406, 400.7054443359375, 'control,', 19, 1, 1)]
]

Upvotes: 1

Related Questions