benrussell80
benrussell80

Reputation: 347

How to find the number of every length of contiguous sequences of values in a list?

Problem

Given a sequence (list or numpy array) of 1's and 0's how can I find the number of contiguous sub-sequences of values? I want to return a JSON-like dictionary of dictionaries.

Example

[0, 0, 1, 1, 0, 1, 1, 1, 0, 0] would return

{
  0: {
        1: 1,
        2: 2
  },
  1: {
        2: 1,
        3: 1
  }

}

Tried

This is the function I have so far

def foo(arr):
    prev = arr[0]
    count = 1

    lengths = dict.fromkeys(arr, {})

    for i in arr[1:]:
        if i == prev:
            count += 1
        else:
            if count in lengths[prev].keys():
                lengths[prev][count] += 1
            else:
                lengths[prev][count] = 1

            prev = i
            count = 1

    return lengths

It is outputting identical dictionaries for 0 and 1 even if their appearance in the list is different. And this function isn't picking up the last value. How can I improve and fix it? Also, does numpy offer any quicker ways to solve my problem if my data is in a numpy array? (maybe using np.where(...))

Upvotes: 1

Views: 233

Answers (1)

Prune
Prune

Reputation: 77867

You're suffering from Ye Olde Replication Error. Let's instrument your function to show the problem, adding one line to check the object ID of each dict in the list:

lengths = dict.fromkeys(arr, {})
print(id(lengths[0]), id(lengths[1]))

Output:

140130522360928 140130522360928
{0: {2: 2, 1: 1, 3: 1}, 1: {2: 2, 1: 1, 3: 1}}

The problem is that you gave the same dict as initial value for each key. When you update either of them, you're changing the one object to which they both refer.

Replace it with an explicit loop -- not a mutable function argument -- that will create a new object for each dict entry:

for key in lengths:
    lengths[key] = {}
print(id(lengths[0]), id(lengths[1]))

Output:

139872021765576 139872021765288
{0: {2: 1, 1: 1}, 1: {2: 1, 3: 1}}

Now you have separate objects.

If you want a one-liner, use a dict comprehension:

lengths = {key: {} for key in lengths}

Upvotes: 2

Related Questions