GKC
GKC

Reputation: 479

Reassign values to specific columns and merge with the rest of columns using pandas

I have the below dataframe and I want to set value to binary 1 if id: 1 is greater than 100 and id: 2 is less than 100 or to binary 0 if id: 1 is less than 101 and id: 2 is greater than 99.

time                 id   value
2012-11-01 22:00:06   1   500
2012-11-01 22:00:07   1   50
2012-11-01 22:00:08   1   0
2012-11-01 22:00:09   2   45
2012-11-01 22:00:10   2   150
2012-11-01 22:00:11   2   70
2012-11-01 22:00:12   2   20
2012-11-01 22:00:13   2   0
2012-11-01 22:00:13   3   1
2012-11-01 22:00:13   3   0
2012-11-01 22:00:13   4   1
2012-11-01 22:00:13   4   1

If I only have id: 1 and id: 2 in my dataframe I can achieve this by adding a new column as below.

rindx=df[((df['id'] == 1) & (df['value'] > 100)) | ((df['id'] == 2) & (df['value'] < 100))].index 
df.loc[rindx,'threshold']= 1 
rindx=df[((df['id'] == 1) & (df['value'] < 101)) | ((df['id'] == 2) & (df['value'] > 99))].index 
df.loc[rindx,'threshold']= 0

How can I achieve this when I have different ids whose values are not uniform e.g. in this case id: 1 and id: 2 do not have binary values and I need to convert them to binary just like id: 3 and id: 4.

Expected output:

time                 id   value
2012-11-01 22:00:06   1   1
2012-11-01 22:00:07   1   0
2012-11-01 22:00:08   1   0
2012-11-01 22:00:09   2   1
2012-11-01 22:00:10   2   0
2012-06-01 22:00:11   2   1
2012-11-01 22:00:12   2   1
2012-11-01 22:00:13   2   0
2012-11-01 22:00:13   3   1
2012-11-01 22:00:13   3   0
2012-11-01 22:00:13   4   1
2012-11-01 22:00:13   4   1

Upvotes: 1

Views: 30

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150825

You are almost there:

rindx=df[((df['id'] == 1) & (df['value'] > 100)) | ((df['id'] == 2) & (df['value'] < 100))].index 
df.loc[rindx,'threshold']= 1 
rindx=df[((df['id'] == 1) & (df['value'] < 101)) | ((df['id'] == 2) & (df['value'] > 99))].index 
df.loc[rindx,'threshold']= 0

## fillna
df['threshold'] = df['threshold'].fillna(df['value']).astype(int)

Output:

                   time  id  value  threshold
0   2012-11-01 22:00:06   1    500          1
1   2012-11-01 22:00:07   1     50          0
2   2012-11-01 22:00:08   1      0          0
3   2012-11-01 22:00:09   2     45          1
4   2012-11-01 22:00:10   2    150          0
5   2012-11-01 22:00:11   2     70          1
6   2012-11-01 22:00:12   2     20          1
7   2012-11-01 22:00:13   2      0          1
8   2012-11-01 22:00:13   3      1          1
9   2012-11-01 22:00:13   3      0          0
10  2012-11-01 22:00:13   4      1          1
11  2012-11-01 22:00:13   4      1          1

Upvotes: 1

Related Questions