Reputation: 65
I would like to find the starting and ending index of all the userId in the list, I want to do this without specifying every single userId, because the dataset is large.
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1.......213,213,213,213]
I want the output to be
[{1: 0, 20},{2: 21, 40}.....{213: 29,703, 30,000}]
Is there a package or function that can do this automatically in python?
Upvotes: 1
Views: 255
Reputation: 5877
You can do this:
from collections import Counter
a = ...
a_counter = Counter(a)
a_indices = []
running_count = 0
for x, x_count in sorted(a_counter.items()):
a_indices.append({x: (running_count, running_count + x_count - 1)})
running_count += x_count
For example, if a = [1, 1, 2, 2, 3, 3]
, a_indices = [{1: (0, 1)}, {2: (2, 3)}, {3: (4, 5)}]
(closest to your output format, while being valid).
If you're willing to slightly change your output format, use:
a_indices = {}
running_count = 0
for x, x_count in sorted(a_counter.items()):
a_indices[x] = (running_count, running_count + x_count - 1)
running_count += x_count
Now a_indices
, for the a
above, will be {1: (0, 1), 2: (2, 3), 3: (4, 5)}
, a much nicer structure to work with.
Both of these solutions will make each end index for x
inclusive. If you want to make it exclusive, replace running_count + x_count - 1
with running_count + x_count
.
Upvotes: 1