Jack math
Jack math

Reputation: 67

How to create a new column based on a condition in another column

In pandas, How can I create a new column B based on a column A in df, such that:

However, the first B_i value is always one

Example:

A B
5 1 (the first B_i)
12 1
14 0
22 1
20 0
33 1

Upvotes: 1

Views: 1133

Answers (3)

Salamander Krajza
Salamander Krajza

Reputation: 236

I was little confused with your rows numeration bacause we should have missing value on last row instead of first if we calcule for B_i basing on condition A_(i+1)-A_(i) (first row should have both, A_(i) and A_(i+1) and last row should be missing A_(i+1) value.

Anyway,basing on your example i assumed that we calculate for B_(i+1).

import pandas as pd
df = pd.DataFrame(columns=["A"],data=[5,12,14,22,20,33])
df['shifted_A'] = df['A'].shift(1) #This row can be removed - it was added only show to how shift works on final dataframe
df['B']=''
df.loc[((df['A']-df['A'].shift(1))>5) + (df['A'].shift(1)<=10), 'B']=1 #Update rows that fulfill one of conditions with 1
df.loc[(df['A']-df['A'].shift(1))<=5, 'B']=0 #Update rows that fulfill condition with 0
df.loc[df.index==0, 'B']=1 #Update first row in B column
print(df)

That prints:

    A  shifted_A  B
0   5        NaN  1
1  12        5.0  1
2  14       12.0  0
3  22       14.0  1
4  20       22.0  0
5  33       20.0  1

I am not sure if it is fastest way, but i guess it should be one of easier to understand.

Little explanation:

df.loc[mask, columnname]=newvalue allows us to update value in given column if condition (mask) is fulfilled

(df['A']-df['A'].shift(1))>5) + (df['A'].shift(1)<=10) Each condition here returns True or False. If we added them the result is True if any of that is True (which is simply OR). In case we need AND we can multiply the conditions

Upvotes: 1

jezrael
jezrael

Reputation: 863531

Use Series.diff, replace first missing value for 1 after compare for greater or equal by Series.ge:

N = 5
df['B'] = (df.A.diff().fillna(N).ge(N) | df.A.lt(10)).astype(int)
print (df)
    A  B
0   5  1
1  12  1
2  14  0
3  22  1
4  20  0
5  33  1

Upvotes: 1

mozway
mozway

Reputation: 262284

Use diff with a comparison to your value and convertion from boolean to int using le:

N = 5
df['B'] = (~df['A'].diff().le(N)).astype(int)

NB. using a le(5) comparison with inversion enables to have 1 for the first value output:

    A  B
0   5  1
1  12  1
2  14  0
3  22  1
4  20  0
5  33  1

updated answer, simply combine a second condition with OR (|):

df['B'] = (~df['A'].diff().le(5)|df['A'].lt(10)).astype(int)

output: same as above with the provided data

Upvotes: 3

Related Questions