Reputation: 63
Background
I have a nested list. In each sublist are four 1s or 0s identifying survival (living or dead) for a specimen. When a specimen is born they go from 0 to 1, a specimen death goes from 1 to 0.
Problem
Because specimens can't come back from the dead I need to identify where a specimen was 1, went to 0 at least once and then back to 1.
Current Solution
Because the list always has exactly 4 entries (each representing a year where their living or dead status was observed), I've hard-coded a solution based on comparing the entries with if/else but it is very inelegant and would require exponentially more if/else statements if the data is collected another year.
Is there a better way?
Short, Self Contained, Compilable Example
#Sample list
my_list = [[1,1,1,1], #no issue, survived 4 years
[1,1,0,1], #data quality issue, died and then born again
[1,0,0,1], #data quality issue, died and then born again
[0,0,0,1], #no issue, born in 4th year
[1,0,0,0]] #no issue, died after 1st year
#Iterate through the list
for each in my_list:
if ((each[0] == 1) and (each[1] == 1)) and (each[2] == 0) and (each[3] == 1):
print(str(each) + ' has data quality issues')
if ((each[0] == 1) and (each[1] == 0)) and (each[2] == 1):
print(str(each) + ' has data quality issues')
if ((each[1] == 0) and (each[2] == 0)) and (each[3] == 1):
if each[0] == 0:
pass #do nothing
else:
print(str(each) + ' has data quality issues')
Output
[1, 1, 0, 1] has data quality issues
[1, 0, 0, 1] has data quality issues
Upvotes: 1
Views: 60
Reputation: 30933
If we prepend and append some zeros, we can account for samples that were alive at the first or last sample, and give them a fake birth/death date. We can then use itertools.groupby()
(as suggested by Daweo in a now-deleted answer) to ensure that there are no more than two transitions (i.e. at most three groups):
import itertools
def data_is_ok(lst):
'''
>>> data_is_ok([0,0,0,0])
True
>>> data_is_ok([1,1,1,1])
True
>>> data_is_ok([1,1,0,1])
False
>>> data_is_ok([1,0,0,1])
False
>>> data_is_ok([0,0,0,1])
True
>>> data_is_ok([0,0,1,0])
True
>>> data_is_ok([1,0,0,0])
True
'''
return len(list(itertools.groupby([0] + lst + [0]))) <= 3
if __name__ == '__main__':
import doctest
doctest.testmod()
Upvotes: 2
Reputation: 1265
You can use itertools.dropwhile
function for this.
import itertools
for lifetime in my_list:
iter_lifetime = iter(lifetime)
after_birth = itertools.dropwhile(lambda x: x == 0, iter_lifetime)
after_death = itertools.dropwhile(lambda x: x == 1, after_birth)
after_rebirth = itertools.dropwhile(lambda x: x == 0, after_death)
if next(after_rebirth, None) is not None:
print(lifetime, "has quality issues")
Explanation: firstly, we remove all the leading zeroes (after_birth
) from the list (converted to iterator). From the rest we remove the ones (i.e., the "life" of the entity; after_death
). Finally we then remove all the leading zeros from the rest (after_rebirth
). This is expected to remove the rest of the iterator, however if there is something remaining, it means that the entity was reborn -- that is the final check.
Upvotes: 3
Reputation: 19432
I would approach this the following way:
You want to detect when there is a "hole" of zero. In other words (pronounced as a regex - we will use that later) we are looking for 10+1
in the list.
Since there is no such thing as regex for lists, I would turn the lists to a "bit"-string (using join
) and then simply use - you guessed it - regex:
import re
my_list = [[1, 1, 1, 1],
[1, 1, 0, 1],
[1, 0, 0, 1],
[0, 0, 0, 1],
[1, 0, 0, 0],
[0, 0, 1, 0]]
for each in my_list:
bitstring = ''.join(map(str, each))
if re.search("10+1", bitstring):
print(each, 'has data quality issues')
Gives the same:
[1, 1, 0, 1] has data quality issues
[1, 0, 0, 1] has data quality issues
Upvotes: 3