Yashaswini Bhat
Yashaswini Bhat

Reputation: 81

How to get the max index of distinct groups of integers in a python list

Example:

[0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]

In this case I need:

  1. 1st '0' group = index: 0-4 , length : 5
  2. 1st '1' group = index: 5-6 , length : 2
  3. 2nd '0' group = index: 7 , length : 1
  4. 2nd '1' group = index: 8-17 , length : 10 <---- NEED THIS the index of max length of '1's
  5. 3rd '0' group = index: 18 - 22 , length : 5

Upvotes: 1

Views: 523

Answers (4)

Jab
Jab

Reputation: 27485

I think you are looking for itertools.groupby. With this you can get a list of lists by each grouping of integers in the original dataset.

>>> data = [0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
>>> [list(group) for _, group in itertools.groupby(data)]
[[0, 0, 0, 0, 0], [1, 1], [0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 0, 0,0, 0]]

Or to get indexes, you can also do this in one line using itertools.groupby and .islice and operator.itemgetter

>>> [sorted(set(itemgetter(0, -1)([i[0] for i in g))) for _, g in groupby(enumerate(data), key=itemgetter(1))]
[[0, 4], [5, 6], [7], [8, 17], [18, 22]]

Or to get the starting or ending indexes, use this: (notice min and max determine the start or end index)

>>> [min(i[0] for i in group) for _, group in groupby(data)]
[0, 5, 7, 8, 18]
>>> [max(i[0] for i in group) for _, group in groupby(data)]
[4, 6, 7, 17, 22]

And to get the starting index of the largest group use:

>>> max(([next(group)[0], sum(1 for _ in group)] for _, group in groupby(enumerate(data), key=itemgetter(1))), key=itemgetter(1))[0]
8

Upvotes: 3

Yaakov Bressler
Yaakov Bressler

Reputation: 12018

You could iterate using the following function:

def count_through_a_list(x):
  """
  returns all distinct continuous groups of values in a list
  output is in the form of records
  """

  # Initialize these values
  group_start = 0
  group_count = 1
  prev = x[0]
  groups = []

  for i,n in enumerate(x):

    # if n is not the same as the previous value OR i is the last index
    if n!=prev or i == len(x)-1:
      groups.append({'start':group_start, 'end':i-1, 'value':prev, 'length':i-group_start, 'group_counter':group_count})
      # Reset the appropriate values
      group_count+=1
      group_start = i
      prev = n

  return groups

groups = count_through_a_list(x)

pd.DataFrame(groups, columns=['start','end','value', 'length', 'group_counter'])

    start   end value   length  group_counter
0   0   4   0   5   1
1   5   6   1   2   2
2   7   7   0   1   3
3   8   17  1   10  4
4   18  21  0   4   5

Upvotes: 0

Derek Eden
Derek Eden

Reputation: 4618

you can do it another way without itertools:

j=0
for i,val in enumerate(data):
    if i == 0:
        out=[[val]]
    if val == data[i-1]:
        out[j] += [val]
    else:
        j+=1
        out += [[val]]

output:

[[0, 0, 0, 0, 0, 0], [1, 1], [0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 0, 0, 0, 0]]

now, make a dict with the unique values and the lengths of the sublists for each value:

counts = {}
for o in out:
    if o[0] not in counts.keys():
        counts[o[0]] = [len(o)]
    else:
        counts[o[0]] += [len(o)]

output:

{0: [6, 1, 5], 1: [2, 10]}

now get the max length of the sequences with the value you are after, in your case it's 1:

max(counts[1])

output:

10

EDIT : to also get the indices of this specific sequence you can do this:

id0 = 0
for o in out:
    if o[0] != 1 or len(o) != max(counts[1]):
        id0 += len(o)
    if o[0] == 1 and len(o) == max(counts[1]):
        id0 -= 1
        break

id1 = id0 + max(counts[1]) - 1
print(max(counts[1]), id0, id1)

output:

10 8 17

it isnt the prettiest...but it works :)

Upvotes: 1

Karl Knechtel
Karl Knechtel

Reputation: 61498

The standard library provides itertools.groupby for this purpose. It's a bit tricky to use, because it does a lot of work:

>>> from itertools import groupby
>>> data = [0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
>>> groupby(data)
<itertools.groupby object at 0x0000015AB6EB3C78>

Hmm. It doesn't seem very useful yet. But we look at the documentation and see that it's a generator, so let's try expanding it into a list:

>>> list(groupby(data))
[(0, <itertools._grouper object at 0x0000015AB6EC2BA8>), (1, <itertools._grouper
 object at 0x0000015AB6ED82B0>), (0, <itertools._grouper object at 0x0000015AB6E
D8518>), (1, <itertools._grouper object at 0x0000015AB6EFE780>), (0, <itertools.
_grouper object at 0x0000015AB6F028D0>)]

The 0 and 1 values in here correspond to the 0s and 1s in the original data, but we still have these other objects. Those are also generators:

>>> [(value, list(grouper)) for value, grouper in groupby(data)]
[(0, [0, 0, 0, 0, 0]), (1, [1, 1]), (0, [0]), (1, [1, 1, 1, 1, 1, 1, 1, 1, 1,
1]), (0, [0, 0, 0, 0, 0])]

Now we can see what's going on: the grouper objects generate chunks from the list.

So all we need to do is check the len of those lists and get the maximum value. We fix the comprehension so that we ignore the value and get the len of each grouper, and feed the results to the built-in max instead of making a list:

>>> max(len(list(grouper)) for value, grouper in groupby(data))
10

Upvotes: 1

Related Questions