Reputation: 347
Given a sequence (list or numpy array) of 1's and 0's how can I find the number of contiguous sub-sequences of values? I want to return a JSON-like dictionary of dictionaries.
[0, 0, 1, 1, 0, 1, 1, 1, 0, 0]
would return
{
0: {
1: 1,
2: 2
},
1: {
2: 1,
3: 1
}
}
This is the function I have so far
def foo(arr):
prev = arr[0]
count = 1
lengths = dict.fromkeys(arr, {})
for i in arr[1:]:
if i == prev:
count += 1
else:
if count in lengths[prev].keys():
lengths[prev][count] += 1
else:
lengths[prev][count] = 1
prev = i
count = 1
return lengths
It is outputting identical dictionaries for 0 and 1 even if their appearance in the list is different. And this function isn't picking up the last value. How can I improve and fix it? Also, does numpy offer any quicker ways to solve my problem if my data is in a numpy array? (maybe using np.where(...)
)
Upvotes: 1
Views: 233
Reputation: 77867
You're suffering from Ye Olde Replication Error. Let's instrument your function to show the problem, adding one line to check the object ID of each dict in the list:
lengths = dict.fromkeys(arr, {})
print(id(lengths[0]), id(lengths[1]))
Output:
140130522360928 140130522360928
{0: {2: 2, 1: 1, 3: 1}, 1: {2: 2, 1: 1, 3: 1}}
The problem is that you gave the same dict as initial value for each key. When you update either of them, you're changing the one object to which they both refer.
Replace it with an explicit loop -- not a mutable function argument -- that will create a new object for each dict entry:
for key in lengths:
lengths[key] = {}
print(id(lengths[0]), id(lengths[1]))
Output:
139872021765576 139872021765288
{0: {2: 1, 1: 1}, 1: {2: 1, 3: 1}}
Now you have separate objects.
If you want a one-liner, use a dict comprehension:
lengths = {key: {} for key in lengths}
Upvotes: 2