Reputation: 63

How do you identify values inside a list meeting criteria?

Background

I have a nested list. In each sublist are four 1s or 0s identifying survival (living or dead) for a specimen. When a specimen is born they go from 0 to 1, a specimen death goes from 1 to 0.

Problem

Because specimens can't come back from the dead I need to identify where a specimen was 1, went to 0 at least once and then back to 1.

Current Solution

Because the list always has exactly 4 entries (each representing a year where their living or dead status was observed), I've hard-coded a solution based on comparing the entries with if/else but it is very inelegant and would require exponentially more if/else statements if the data is collected another year.

Is there a better way?

Short, Self Contained, Compilable Example

#Sample list
my_list = [[1,1,1,1], #no issue, survived 4 years
       [1,1,0,1], #data quality issue, died and then born again
       [1,0,0,1], #data quality issue, died and then born again    
       [0,0,0,1], #no issue, born in 4th year
       [1,0,0,0]] #no issue, died after 1st year

#Iterate through the list
for each in my_list:
    if ((each[0] == 1) and (each[1] == 1)) and (each[2] == 0) and (each[3] == 1):
        print(str(each) + ' has data quality issues')
    if ((each[0] == 1) and (each[1] == 0)) and (each[2] == 1):
        print(str(each) + ' has data quality issues')
    if ((each[1] == 0) and (each[2] == 0)) and (each[3] == 1):
        if each[0] == 0:
            pass #do nothing
        else:
            print(str(each) + ' has data quality issues')

Output

[1, 1, 0, 1] has data quality issues
[1, 0, 0, 1] has data quality issues

Upvotes: 1

Answers (3)

Toby Speight

Reputation: 30933

If we prepend and append some zeros, we can account for samples that were alive at the first or last sample, and give them a fake birth/death date. We can then use itertools.groupby() (as suggested by Daweo in a now-deleted answer) to ensure that there are no more than two transitions (i.e. at most three groups):

import itertools

def data_is_ok(lst):
    '''
    >>> data_is_ok([0,0,0,0])
    True
    >>> data_is_ok([1,1,1,1])
    True
    >>> data_is_ok([1,1,0,1])
    False
    >>> data_is_ok([1,0,0,1])
    False
    >>> data_is_ok([0,0,0,1])
    True
    >>> data_is_ok([0,0,1,0])
    True
    >>> data_is_ok([1,0,0,0])
    True
    '''
    return len(list(itertools.groupby([0] + lst + [0]))) <= 3

if __name__ == '__main__':
    import doctest
    doctest.testmod()

Upvotes: 2

Drecker

Reputation: 1265

You can use itertools.dropwhile function for this.

import itertools

for lifetime in my_list:
    iter_lifetime = iter(lifetime)
    after_birth = itertools.dropwhile(lambda x: x == 0, iter_lifetime)
    after_death = itertools.dropwhile(lambda x: x == 1, after_birth)
    after_rebirth = itertools.dropwhile(lambda x: x == 0, after_death)
    if next(after_rebirth, None) is not None:
        print(lifetime, "has quality issues")

Explanation: firstly, we remove all the leading zeroes (after_birth) from the list (converted to iterator). From the rest we remove the ones (i.e., the "life" of the entity; after_death). Finally we then remove all the leading zeros from the rest (after_rebirth). This is expected to remove the rest of the iterator, however if there is something remaining, it means that the entity was reborn -- that is the final check.

Upvotes: 3

Tomerikoo

Reputation: 19432

I would approach this the following way:

Observation

You want to detect when there is a "hole" of zero. In other words (pronounced as a regex - we will use that later) we are looking for 10+1 in the list.

Solution

Since there is no such thing as regex for lists, I would turn the lists to a "bit"-string (using join) and then simply use - you guessed it - regex:

import re

my_list = [[1, 1, 1, 1],
           [1, 1, 0, 1],
           [1, 0, 0, 1],
           [0, 0, 0, 1],
           [1, 0, 0, 0],
           [0, 0, 1, 0]]

for each in my_list:
    bitstring = ''.join(map(str, each))
    if re.search("10+1", bitstring):
        print(each, 'has data quality issues')

Gives the same:

[1, 1, 0, 1] has data quality issues
[1, 0, 0, 1] has data quality issues

Upvotes: 3

How do you identify values inside a list meeting criteria?

Answers (3)

Observation

Solution

Related Questions