Reputation: 645

How do I check for continuous range of values in each row of pandas dataframe effectively?

Suppose we have pandas dataframe which looks like this one:

df = pd.DataFrame(
        {'A': [0, 0, 1, 0],
        'a': list('aaaa'),
        'B': [1, 0 , 0, 1],
        'b': list('bbbb'),
        'C': [1, 1, 0, 1],
        'c': list('cccc'),
        'D': [0, 1, 0, 1],
        'd': list('dddd')},
        index=[1, 2, 3, 4])

The output would be:

   A  a  B  b  C  c  D  d
1  0  a  1  b  1  c  0  d
2  0  a  0  b  1  c  1  d
3  1  a  0  b  0  c  0  d
4  0  a  1  b  1  c  1  d

So now I want to get rows of this data frame which contains at least for example two zeros sequentially in columns A, B, C, D.
For dataframe above the rows with index 2 and 3 are satisfy this conditions: columns A, B of second row contains zeros, and columns B, C is enough for third row.

And the method of finding such sequence should work if I want to find three or more sequential zeros.

So eventually I want to have boolean Series which should looks like:

1 false
2 true
3 true
4 false

to use that Series as mask for original dataframe.

Upvotes: 2

Answers (3)

BENY

Reputation: 323396

Data set up from cs95

u = df.select_dtypes(np.number).T

(u.rolling(2).sum()==0).any()
Out[404]: 
1    False
2     True
3     True
4    False
dtype: bool

Upvotes: 1

Julian

Reputation: 166

You can use pandas' apply function and define your own function checking your condition as follows:

# columns you want to check. Note they have to be in the right order!!
columns = ["A", "B", "C", "D"]

# Custom function you apply over df, takes a row as input
def zeros_condition(row):
    # loop over the columns.
    for n in range(len(columns)-1): 
        # return true if 0s in two adjacent columns, else false
        if row[columns[n]] == row[columns[n+1]] == 0:
            return True
    return False

result = df.apply(zeros_condition, axis=1)

result is:

1    False
2     True
3     True
4    False
dtype: bool

Upvotes: 0

cs95

Reputation: 403218

Select the numeric columns, then use shift to compare:

u = df.select_dtypes(np.number).T
((u == u.shift()) & (u == 0)).any()

1    False
2     True
3     True
4    False
dtype: bool

Upvotes: 3

How do I check for continuous range of values in each row of pandas dataframe effectively?

Answers (3)

Related Questions