Vanessa S.
Vanessa S.

Reputation: 133

Check several conditions for all values in a column

I have just started using python & pandas. I have searched google and stack overflow for an answer to my question but haven't been able to find one. This is what I need to do:

I have a df with several data rows per person (id) and a variable called response_go, which can be coded 1 or 0 (type int64), such as this one (just way bigger with 480 rows per person...)

   ID response_go
0  1     1
1  1     0
2  1     0
3  1     1
4  2     1
5  2     0
6  2     1
7  2     1

Now, I want to check for each ID/ person whether the entries in response_go separately are all coded 0, all coded 1, or neither (the else condition). So far, I have come up with this:

    ids = df['ID'].unique()

    for id in ids:   
        if (df.response_go.all() == 1): 
            print "ID:",id,": 100% Go"
        elif (df.response_go.all() == 0):
            print "ID:",id,": 100% NoGo"
    else:
        print "ID:",id,": Mixed Response Pattern"

However, it gives me the following output:

ID: 1 : 100% NoGo
ID: 2 : 100% NoGo
ID: 2 : Mixed Response Pattern

when it should be (as both ones & zeros are included)

ID: 1 : Mixed Response Pattern
ID: 2 : Mixed Response Pattern

I am really sorry if this question might have been asked before but when searching for an answer, I really found nothing to solve this issue. And if this has been answered before, please point me to the solution. Thank you everyone!!!! Really appreciate it!

Upvotes: 2

Views: 62

Answers (1)

cs95
cs95

Reputation: 402263

Sample (with different data) -

df = pd.DataFrame({'ID' : [1] * 3 + [2] * 3 + [3] * 3, 
                   'response_go' : [0, 0, 0, 1, 1, 1, 0, 1, 0]})
df

   ID  response_go
0   1            0
1   1            0
2   1            0
3   2            1
4   2            1
5   2            1
6   3            0
7   3            1
8   3            0

Use groupby + mean -

v = df.groupby('ID').response_go.mean()
v

ID
1    0.000000
2    1.000000
3    0.333333
Name: response_go, dtype: float64

Use np.select to compute your statuses based on the mean of response_go -

u = np.select([v == 1, v == 0, v < 1], ['100% Go', '100% NoGo', 'Mixed Response Pattern'])

Or, use a nested np.where to do the same thing (slightly faster) -

u = np.where(v == 1, '100% Go', np.where(v == 0, '100% NoGo', 'Mixed Response Pattern'))

Now, assign the result back -

v[:] = u
v

ID
1                 100% NoGo
2                   100% Go
3    Mixed Response Pattern
Name: response_go, dtype: object

Upvotes: 2

Related Questions