Reputation: 81
Example:
[0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
In this case I need:
'0'
group = index: 0-4 , length : 5 '1'
group = index: 5-6 , length : 2'0'
group = index: 7 , length : 1'1'
group = index: 8-17 , length : 10 <---- NEED THIS the index of max length of '1'
s '0
' group = index: 18 - 22 , length : 5Upvotes: 1
Views: 523
Reputation: 27485
I think you are looking for itertools.groupby
. With this you can get a list of lists by each grouping of integers in the original dataset.
>>> data = [0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
>>> [list(group) for _, group in itertools.groupby(data)]
[[0, 0, 0, 0, 0], [1, 1], [0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 0, 0,0, 0]]
Or to get indexes, you can also do this in one line using itertools.groupby and .islice
and operator.itemgetter
>>> [sorted(set(itemgetter(0, -1)([i[0] for i in g))) for _, g in groupby(enumerate(data), key=itemgetter(1))]
[[0, 4], [5, 6], [7], [8, 17], [18, 22]]
Or to get the starting or ending indexes, use this: (notice min
and max
determine the start or end index)
>>> [min(i[0] for i in group) for _, group in groupby(data)]
[0, 5, 7, 8, 18]
>>> [max(i[0] for i in group) for _, group in groupby(data)]
[4, 6, 7, 17, 22]
And to get the starting index of the largest group use:
>>> max(([next(group)[0], sum(1 for _ in group)] for _, group in groupby(enumerate(data), key=itemgetter(1))), key=itemgetter(1))[0]
8
Upvotes: 3
Reputation: 12018
You could iterate using the following function:
def count_through_a_list(x):
"""
returns all distinct continuous groups of values in a list
output is in the form of records
"""
# Initialize these values
group_start = 0
group_count = 1
prev = x[0]
groups = []
for i,n in enumerate(x):
# if n is not the same as the previous value OR i is the last index
if n!=prev or i == len(x)-1:
groups.append({'start':group_start, 'end':i-1, 'value':prev, 'length':i-group_start, 'group_counter':group_count})
# Reset the appropriate values
group_count+=1
group_start = i
prev = n
return groups
groups = count_through_a_list(x)
pd.DataFrame(groups, columns=['start','end','value', 'length', 'group_counter'])
start end value length group_counter
0 0 4 0 5 1
1 5 6 1 2 2
2 7 7 0 1 3
3 8 17 1 10 4
4 18 21 0 4 5
Upvotes: 0
Reputation: 4618
you can do it another way without itertools:
j=0
for i,val in enumerate(data):
if i == 0:
out=[[val]]
if val == data[i-1]:
out[j] += [val]
else:
j+=1
out += [[val]]
output:
[[0, 0, 0, 0, 0, 0], [1, 1], [0], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 0, 0, 0, 0]]
now, make a dict with the unique values and the lengths of the sublists for each value:
counts = {}
for o in out:
if o[0] not in counts.keys():
counts[o[0]] = [len(o)]
else:
counts[o[0]] += [len(o)]
output:
{0: [6, 1, 5], 1: [2, 10]}
now get the max length of the sequences with the value you are after, in your case it's 1:
max(counts[1])
output:
10
EDIT : to also get the indices of this specific sequence you can do this:
id0 = 0
for o in out:
if o[0] != 1 or len(o) != max(counts[1]):
id0 += len(o)
if o[0] == 1 and len(o) == max(counts[1]):
id0 -= 1
break
id1 = id0 + max(counts[1]) - 1
print(max(counts[1]), id0, id1)
output:
10 8 17
it isnt the prettiest...but it works :)
Upvotes: 1
Reputation: 61498
The standard library provides itertools.groupby
for this purpose. It's a bit tricky to use, because it does a lot of work:
>>> from itertools import groupby
>>> data = [0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
>>> groupby(data)
<itertools.groupby object at 0x0000015AB6EB3C78>
Hmm. It doesn't seem very useful yet. But we look at the documentation and see that it's a generator, so let's try expanding it into a list:
>>> list(groupby(data))
[(0, <itertools._grouper object at 0x0000015AB6EC2BA8>), (1, <itertools._grouper
object at 0x0000015AB6ED82B0>), (0, <itertools._grouper object at 0x0000015AB6E
D8518>), (1, <itertools._grouper object at 0x0000015AB6EFE780>), (0, <itertools.
_grouper object at 0x0000015AB6F028D0>)]
The 0 and 1 values in here correspond to the 0s and 1s in the original data
, but we still have these other objects. Those are also generators:
>>> [(value, list(grouper)) for value, grouper in groupby(data)]
[(0, [0, 0, 0, 0, 0]), (1, [1, 1]), (0, [0]), (1, [1, 1, 1, 1, 1, 1, 1, 1, 1,
1]), (0, [0, 0, 0, 0, 0])]
Now we can see what's going on: the grouper objects generate chunks from the list.
So all we need to do is check the len
of those lists and get the maximum value. We fix the comprehension so that we ignore the value
and get the len
of each grouper
, and feed the results to the built-in max
instead of making a list:
>>> max(len(list(grouper)) for value, grouper in groupby(data))
10
Upvotes: 1